[Bioperl-l] Request for direction...

Mark Wilkinson mwilkinson@gene.pbi.nrc.ca
Wed, 10 Oct 2001 09:26:17 -0600

ha ha!   dissenting opinions!  :-)

I am merging  Francis' and Malcom's comments below and adding my own:

> If any of these result in any
> alternative CDS, then there should be alternative CDS as well.
> Ideally there is a matching name between these CDSs and these mRNAs

Well, there isn't really a bioperl object type corresponding to a CDS, so
those lines just get slurped up as SeqFeature::Generic with split locations
if the CDS is the result of a join().

> 2) Only work with mRNA and CDS features. When I worked at the NCBI on
> this problem, exon features where not validated per se (not with
> software anyways), but only the join in the mRNA and CDS feature (these
> should match appart from the longer ist and last exons, and/or the
> inclusion of non-coding 5' and/or 3' exons).

There appear to be a good many genes which don't have an mRNA
representation at all... just a list of exons and introns (e.g.. ATF14F8)
or the mRNA is represented in the 'gene' tag as a join() rather than the
usual method of defining the gene boundaries with the gene tag and the mRNA
as a join() of exons.  These would be "lost" under this scenario... Nothin'
like standards to make life easy!  Go Genbank!

> mRNA and CDSs are areal
> -- introns and exons are just biological tags we assign to parts of the
> genome to help us inderstand things -- they actually don't exist in the
> cell as seperate things.

are you belittling exons?   ;-)

> > Does that sound like the "Right Thing" to do?  Is there a good reason
> > to create a new Exon object for each element of a 'join' even if they
> > are redundant to other 'join's?

> Francis:  I don't think so ...

> Malcom:  It stikes me that you must create distinct
> Bio::SeqFeature::Gene::Exon
> objects because, e.g., the 'same' location in different transcripts may
> need
> to be considered as exons having a different /number or /label (or,
> heavens,
> even /gene).

Okay, we have one vote for and one vote against.  My personal preference is
to re-use exons, because to not re-use them would make a mess of my
SeqCanvas module and cause me no end of heartache :-)   On the other hand,
creating new exons for every feature makes the job of writing the parser
much easier...

Does anyone else want to wade in on this issue?  I think there needs to be
a consensus opinion, and then the final decision needs to be well
documented in the SeqI pod.

>> Is this code committed yet?
good lord no... I want people to spend some time worrying about their
pipelines being broken before I actually go and do it  ;-)


"Speed is subsittute fo accurancy."

Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK