[Bioperl-l] Getting CDS boundaries from Unflattener
Scott Cain
cain at cshl.org
Thu Dec 18 09:45:47 EST 2003
Hi Chris,
I very much what to reimplement Bio::DB::GFF::Adaptor::biofetch using
Unflattener, but but there are a few problems I am having. Below is a
section of GFF that I generate using Unflattener from AE003644:
AE003644 EMBL/GenBank/SwissProt gene 20111 23268 . + . ID=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;map=35B2-35B2;note=last+curated+on+Thu+Dec+13+16:51:32+PST+2001
AE003644 EMBL/GenBank/SwissProt mRNA 20111 23268 . + . ID=noc_mRNA_1;Parent=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;product=CG4491-RA
AE003644 EMBL/GenBank/SwissProt CDS 20495 22410 . + . Parent=noc_mRNA_1;codon_start=1;db_xref=GI:7298163,FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;note=noc+gene+product;product=CG4491-PA;protein_id=AAF53399.1;translation=MVVLEGGGGV...
AE003644 EMBL/GenBank/SwissProt exon 20111 20584 . + . Parent=noc_mRNA_1
AE003644 EMBL/GenBank/SwissProt exon 20887 23268 . + . Parent=noc_mRNA_1
The biggest problem with this set of data is that the CDS spans
introns. The CDS really ought to be broken up into segments to match
the exon boundaries. As it is, it breaks display in gbrowse whether it
is using chado or a GFF database as a backend.
The other problem is that the exons' parentage is incorrect. The exons
should be features of the gene, not the mRNA.
Thanks,
Scott
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Bioperl-l
mailing list