[Bioperl-l] Request for direction... was: genbank alternate-splicing representation

francis@cmmt.ubc.ca francis@cmmt.ubc.ca
Tue, 9 Oct 2001 15:48:05 -0700 (PDT)

> Anyway, I would like some direction from the group as to how you would
> like the parser to handle cases where an exon is represented multiple
> times in different transcripts.  My instinct is to continually poll
> existing Transcript objects to test for the presence of an Exon with
> the same start/stop locations, and use that Exon in the Transcript
> being built, rather than create a new Exon with identical parameters.

A few principles:

1) There should be a seperate mRNA feature for all distinct 
alternative mRNAs you want to represent. If any of these result in any 
alternative CDS, then there should be alternative CDS as well.  
Ideally there is a matching name between these CDSs and these mRNAs
(not just being part of the same gene).

2) Only work with mRNA and CDS features. When I worked at the NCBI on
this problem, exon features where not validated per se (not with
software anyways), but only the join in the mRNA and CDS feature (these
should match appart from the longer ist and last exons, and/or the
inclusion of non-coding 5' and/or 3' exons).  mRNA and CDSs are areal
-- introns and exons are just biological tags we assign to parts of the
genome to help us inderstand things -- they actually don't exist in the
cell as seperate things. I say ignore the exon/intron features, and
only work with the mRNA/CDS features.

> Does that sound like the "Right Thing" to do?  Is there a good reason
> to create a new Exon object for each element of a 'join' even if they
> are redundant to other 'join's?

I don't think so ...

> Let me know your opinions before I get too far.  So far I am
> successfully creating transcripts and their subfeatures, but am having
> trouble relating them back to the gene that I am building...  but it's
> a start!  :-)

best of luck to you!


> M
> --
> --------------------------------
> "Speed is subsittute fo accurancy."
> ________________________________
> Dr. Mark Wilkinson
> Bioinformatics Group
> National Research Council of Canada
> Plant Biotechnology Institute
> 110 Gymnasium Place
> Saskatoon, SK
> Canada

| B.F. Francis Ouellette                       francis@cmmt.ubc.ca | 
| Director, Bioinformatics Core Facility       Tel: (604) 875-3815 | 
| Centre for Molecular Medicine & Therapeutics Fax: (425) 740-6978 | 
| Vancouver, BC Canada            http://www.cmmt.ubc.ca/ouellette |