[Bioperl-l] Re: Getting CDS boundaries from Unflattener
Scott Cain
cain at cshl.org
Fri Dec 19 11:33:18 EST 2003
On Fri, 2003-12-19 at 10:48, Scott Cain wrote:
> On Thu, 2003-12-18 at 16:52, Chris Mungall wrote:
> > On Thu, 18 Dec 2003, Scott Cain wrote:
>
> > > The biggest problem with this set of data is that the CDS spans
> > > introns. The CDS really ought to be broken up into segments to match
> > > the exon boundaries. As it is, it breaks display in gbrowse whether it
> > > is using chado or a GFF database as a backend.
> >
> > When I use the unflattener on AE003644, the CDSs I get out have split
> > locations which match the coding exon boundaries - are you sure this isn't
> > a problem with the GFF code? Are you doing all the usual weird stuff like:
> >
> > if ($sf->location->isa("Bio::Location::SplitLocationI")) {
> > @locs = $sf->location->each_Location;
> > }
>
> Oops--read that documentation, Scott. OK, I fixed Bio::Tools::GFF to
> deal with split locations.
> >
> > > The other problem is that the exons' parentage is incorrect. The exons
> > > should be features of the gene, not the mRNA.
> >
> > I think you have this the wrong way round. Again, this must be a problem
> > with how you're assigning parent tags in the GFF output, when I try
> > AE003644 the exons are children of the mRNA, which is correct.
> >
> I don't think so; here are the relevant lines from SO:
>
> @is_a at gene ; SO:0000704 ; SOFA:SOFA ; SOFA:region
> @part_of at transcript ; SO:0000673 ; SOFA:SOFA ; SOFA:region
> @part_of at exon ; SO:0000147 ; SOFA:SOFA ; SOFA:region
> @is_a at processed_transcript ; SO:0000233 ; SOFA:SOFA ; SOFA:region
> @is_a at mRNA ; SO:0000234 ; SOFA:SOFA ; SOFA:region ; synonym:messenger_RNA
> @part_of at CDS ; SO:0000316 ; SOFA:SOFA ; SOFA:region ; synonym:coding_sequence
>
> Now, I am not one to be lecturing on ontologies, so I may have
> misinterpreted something here, but it looks to me like exon is part of a
> transcript, but not part of an mRNA. And since we typically don't have
> transcript features in Genbank records, exon should be part_of gene. An
> alternative would be to infer a transcript feature for each mRNA feature
> and tie the exons to the transcript features, but leaving the mRNAs and
> CDSs as is.
>
OK, the real problem is that the thing that is labeled an mRNA in the
feature from Unflattener (which it is getting from the genbank record)
is a transcript, not an mRNA/processed transcript. That is not to say
the genbank record is wrong--its not. Generally, the mRNA feature is a
collection of ranges in a join. What Unflattener gives for an mRNA
feature is really a primary transcript.
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Bioperl-l
mailing list