[Bioperl-l] Request for direction... was: genbank alternate-splicing representation

Mark Wilkinson mwilkinson@gene.pbi.nrc.ca
Tue, 09 Oct 2001 16:25:47 -0600


First, thanks to all who have sent messages of commiseration :-)

Having now looked through a few examples of alternate splicing representation in
Genbank I have concluded that it is even more chaotic than I had originally
thought!... there is even inconsistency between the coordinates in an mRNA
element's join() parameter, and the coordinates for a corresponding exon element
in the same record...  i.e. not only can a Genbank file be redundant, but it can
be redundant with errors!!  (see AF006988, exon number 3, which does not have a
fuzzy start even though the 'join' does)

>>sigh<<

Anyway, I would like some direction from the group as to how you would like the
parser to handle cases where an exon is represented multiple times in different
transcripts.  My instinct is to continually poll existing Transcript objects to
test for the presence of an Exon with the same start/stop locations, and use that
Exon in the Transcript being built, rather than create a new Exon with identical
parameters.

Does that sound like the "Right Thing" to do?  Is there a good reason to create a
new Exon object for each element of a 'join' even if they are redundant to other
'join's?

Let me know your opinions before I get too far.  So far I am successfully creating
transcripts and their subfeatures, but am having trouble relating them back to the
gene that I am building...  but it's a start!  :-)

Cheers all!

M

--
--------------------------------
"Speed is subsittute fo accurancy."
________________________________

Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK
Canada