[Bioperl-l] Request for direction...
Mark Wilkinson
mwilkinson@gene.pbi.nrc.ca
Wed, 10 Oct 2001 09:26:17 -0600
ha ha! dissenting opinions! :-)
I am merging Francis' and Malcom's comments below and adding my own:
> If any of these result in any
> alternative CDS, then there should be alternative CDS as well.
> Ideally there is a matching name between these CDSs and these mRNAs
Well, there isn't really a bioperl object type corresponding to a CDS, so
those lines just get slurped up as SeqFeature::Generic with split locations
if the CDS is the result of a join().
> 2) Only work with mRNA and CDS features. When I worked at the NCBI on
> this problem, exon features where not validated per se (not with
> software anyways), but only the join in the mRNA and CDS feature (these
> should match appart from the longer ist and last exons, and/or the
> inclusion of non-coding 5' and/or 3' exons).
There appear to be a good many genes which don't have an mRNA
representation at all... just a list of exons and introns (e.g.. ATF14F8)
or the mRNA is represented in the 'gene' tag as a join() rather than the
usual method of defining the gene boundaries with the gene tag and the mRNA
as a join() of exons. These would be "lost" under this scenario... Nothin'
like standards to make life easy! Go Genbank!
> mRNA and CDSs are areal
> -- introns and exons are just biological tags we assign to parts of the
> genome to help us inderstand things -- they actually don't exist in the
> cell as seperate things.
are you belittling exons? ;-)
> > Does that sound like the "Right Thing" to do? Is there a good reason
> > to create a new Exon object for each element of a 'join' even if they
> > are redundant to other 'join's?
> Francis: I don't think so ...
> Malcom: It stikes me that you must create distinct
> Bio::SeqFeature::Gene::Exon
> objects because, e.g., the 'same' location in different transcripts may
> need
> to be considered as exons having a different /number or /label (or,
> heavens,
> even /gene).
Okay, we have one vote for and one vote against. My personal preference is
to re-use exons, because to not re-use them would make a mess of my
SeqCanvas module and cause me no end of heartache :-) On the other hand,
creating new exons for every feature makes the job of writing the parser
much easier...
Does anyone else want to wade in on this issue? I think there needs to be
a consensus opinion, and then the final decision needs to be well
documented in the SeqI pod.
>> Is this code committed yet?
>
good lord no... I want people to spend some time worrying about their
pipelines being broken before I actually go and do it ;-)
M
--
--------------------------------
"Speed is subsittute fo accurancy."
________________________________
Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK
Canada