[Bioperl-l] Re: Getting CDS boundaries from Unflattener
Scott Cain
cain at cshl.org
Fri Dec 19 10:48:17 EST 2003
On Thu, 2003-12-18 at 16:52, Chris Mungall wrote:
> On Thu, 18 Dec 2003, Scott Cain wrote:
> > The biggest problem with this set of data is that the CDS spans
> > introns. The CDS really ought to be broken up into segments to match
> > the exon boundaries. As it is, it breaks display in gbrowse whether it
> > is using chado or a GFF database as a backend.
>
> When I use the unflattener on AE003644, the CDSs I get out have split
> locations which match the coding exon boundaries - are you sure this isn't
> a problem with the GFF code? Are you doing all the usual weird stuff like:
>
> if ($sf->location->isa("Bio::Location::SplitLocationI")) {
> @locs = $sf->location->each_Location;
> }
Oops--read that documentation, Scott. OK, I fixed Bio::Tools::GFF to
deal with split locations.
>
> > The other problem is that the exons' parentage is incorrect. The exons
> > should be features of the gene, not the mRNA.
>
> I think you have this the wrong way round. Again, this must be a problem
> with how you're assigning parent tags in the GFF output, when I try
> AE003644 the exons are children of the mRNA, which is correct.
>
I don't think so; here are the relevant lines from SO:
@is_a at gene ; SO:0000704 ; SOFA:SOFA ; SOFA:region
@part_of at transcript ; SO:0000673 ; SOFA:SOFA ; SOFA:region
@part_of at exon ; SO:0000147 ; SOFA:SOFA ; SOFA:region
@is_a at processed_transcript ; SO:0000233 ; SOFA:SOFA ; SOFA:region
@is_a at mRNA ; SO:0000234 ; SOFA:SOFA ; SOFA:region ; synonym:messenger_RNA
@part_of at CDS ; SO:0000316 ; SOFA:SOFA ; SOFA:region ; synonym:coding_sequence
Now, I am not one to be lecturing on ontologies, so I may have
misinterpreted something here, but it looks to me like exon is part of a
transcript, but not part of an mRNA. And since we typically don't have
transcript features in Genbank records, exon should be part_of gene. An
alternative would be to infer a transcript feature for each mRNA feature
and tie the exons to the transcript features, but leaving the mRNAs and
CDSs as is.
Thanks,
Scott
> >
> >
> >
> >
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Bioperl-l
mailing list