[Bioperl-l] Proposal: SemanticMapping and call for info on Gene Objects

Ewan Birney birney@ebi.ac.uk
Sun, 12 May 2002 18:57:12 +0100 (BST)

I think this is a great idea, and needed.

Some quick thoughts -

  (a) I think a SeqIO object can only have one semantic mapper (right?)

  (b) Ensembl's model for translation/genes is as follows:

Gene has an (unordered) set of Transcripts
   Transcript - has-a ordered list of Exons
              - has-a translation object which
                      has-a start-exon (one of the above list)
                      has-a start-codon-position (relative to the exon)
                      has-a end-exon
                      has-a end-codon-position (relative to the exon)

   The important thing to note here is that the start/end points are
properties of the transcript/translation, and not of an exon, mainly
because an exon could both be a fully UTR exon or a coding/UTR exon. 

   The drawback is that Ensembl cannot currently deal with start/ends
across introns which is bad (it could do with a little tweaking to the
conventions - ie, start codons mean the first base which always has to lie
in one or other of the exons).

   I am tempted to advocate a more standard case where start/end is
relative to the transcript virtual cDNA. The drawback of this is that it
ends up being more complex to - for example - figure out where the start
exon is and in cases where you are *building* genes computationally
produces some nasty calculation overheads.


Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420