[Biojava-l] genes and things

Ann Loraine loraine@loraine.net
Wed, 27 Sep 2000 18:00:49 -0700 (PDT)


Why don't you attach Exons to SpliceVariants (or possibly
transcipts) instead of Genes?

It seems counter-intuitive to assign Exons to Genes directly, since
an exon doesn't really make sense outside the context of spliced
transcript.

e.g.,

(XXX = exonic, ---- = intronic)

cartoon of gene w/ two splice variants:

XXXXXXXXXXXXX-----XXXXXXXXX  SpliceVariant 1 (w/ exon 1 and 2)
XXXXXXXXXXXXX---------XXXXX  SpliceVariant 2 (w/ exon 1 and 4)

If you were to ask this Gene for all its exons, you'd get {1,3,4},
which is okay, but doesn't make a ton of sense  because it doesn't
say anything about how these exons fit together to make "legal" transcripts.

That is, you can't get a transcript that has *both* exon 2 and 4, for
instance.

So wouldn't you rather get your exons from SpliceVariants instead of
genes? Then you'd know which goes with which.

This would also make interactive editing a lot easier to support.  For
example, let's say you've built an editor and your user is looking at
transcript 2 and decides to make its exon 1 longer on the 3' end.  

If you're storing just one copy of exon 1 internally then your user
will have mucked with transcript 1 as well as transcript 2.  Yuck!

In general,

	-A gene has 1 or more transcripts (due to alternate
		transcriptional start sites)

	-A transcript has 1 or more mRNA splice variants (due to
		alternative splicing)

	-An mRNA has 1 or more translations, due to alternative start
		(and stop?) codons
	
	(Note: An mRNA's start and stop codons may not necessarily be
		in frame with each other, due to codon slippage
		during translation)

The idea is that at every level of information processing you have
the possibility of more than one answer.  In code, you end up with
a tree of features.

-Ann 
---

Ann E. Loraine
http://www.loraine.net

On Tue, 26 Sep 2000, Matthew Pocock wrote:

> Dear all,
> 
> At the end of this month I will be moving part-time to ensembl
> (http://www.ensembl.org). This means that I will end up having to think
> of genomes as containing genes! So far, the support for genes in BioJava
> is lacking. You can use StrandedFeature with a given type tag and a
> well-defined set of children to represent one, but we have no agreed
> structure.
> 
> How do you all want to represent genes? My aproach would be to use the
> interfaces below, but this may be overkill, or may miss out important
> biological posibilities. All comments & flames greatfully accepted.
> 
> Matthew
> 
> extend Feature with:
>   /*
>    * Generate a template object that could be used to create a feature
>    * that is the same as this one. This permits features to be cloned
> into
>    * other contexts e.g. from one database to another, without breaking
>    * the encapsulation.
>    */
>   Template makeTemplate()
> 
> /**
>  * A gene. This will contain zero-or-more transcript features,
>  * and may contain other things (e.g. propmoter elements). It also
>  * maintains a list of all exons known to exist in this gene.
>  */
> public interface Gene extends StrandedFeature {
>   /**
>    * Retrieve the set of exons in this gene. These will be Exon objects.
> 
>    * Only exons in this set are legal for use by an mRNA arrising from
>    * a gene.
>    */
>   public Set getExons();
> }
> 
> /**
>  * A transcript represents a region of a gene that is transcribed. It
>  * will normaly be contiguous, and its strand will be identical to the
>  * strand of the gene (except in odd circumstances). E.g. it is a
>  * region from where polymerase attaches to where it drops off.
>  * <P>
>  * Each transcript will have one SpliceVariant for each possible
>  * mRNA it can be turned into. A single transcript may be spliced
>  * in multiple ways, some of which may be exon-identical to how
>  * other transcripts are spliced.
>  */
> public interface Transcript extends StrandedFeature {
> }
> 
> /**
>  * A possible splicing pattern for a transcript. This should contain
>  * exons from the gene as features, to indicate which regions to
>  * splice in, and which to splice out. The location of the splice
>  * variant is from the beginning of the first exon to the end of the
>  * last one. It is possible that you don't know the transcript
> coordinates,
>  * but you do know the SpliceVariant produced, in which case we
>  * should either have a dummy parent transcript, or add it direct to
>  * the gene.
>  */
> public interface SpliceVariant extends StrandedFeature {
>   /**
>    * Retrieve an mRNA sequence made by splicing together this splice
> variant.
>    * <P>
>    * The returned sequence will contain TranslatedRegion features to
> indicate
>    * which bits are translated.
>    */
>   public Sequence getSplicedSequence();
> }
> 
> /**
>  * A region of mRNA that is translated into protein. This should in
>  * most cases have a contiguos location.
>  */
> public interface TranslatedRegion {
>   /**
>    * Retrive the translation for this region of the mRNA.
>    */
>   public Sequence getTranslation()
> }
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>