[Bioperl-l] Unflattener and GFF3 questions
Scott Cain
cain at cshl.org
Mon Dec 15 13:50:28 EST 2003
Chris,
More Unflattener questions. When I process the Genbank record for
AE003644, I produce the following GFF3:
AE003644 EMBL/GenBank/SwissProt gene 20111 23268 . + . ID=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;map=35B2-35B2;note=last+curated+on+Thu+Dec+13+16:51:32+PST+2001
AE003644 EMBL/GenBank/SwissProt mRNA 20111 23268 . + . ID=noc_mRNA_1;Parent=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;product=CG4491-RA
AE003644 EMBL/GenBank/SwissProt CDS 20495 22410 . + . Parent=noc_mRNA_1;codon_start=1;db_xref=GI:7298163,FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;note=noc+gene+product;product=CG4491-PA;protein_id=AAF53399.1;translation=MVVLEGG...
AE003644 EMBL/GenBank/SwissProt exon 20111 20584 . + . Parent=noc_mRNA_1
AE003644 EMBL/GenBank/SwissProt exon 20887 23268 . + . Parent=noc_mRNA_1
The first question directly relates to Unflattener: the bounds on the
CDS feature don't seem right; that is, they include intronic regions in
the CDS, whereas in the Genbank file, the CDS is indicated properly with
a 'join':
CDS join(20495..20584,20887..22410)
I am guessing this is a problem with the way the CDS feature is created,
correct?
The second question has less to do with Unflattener and more to do with
GFF3. Do you have any suggestions for encoding relationship types in
GFF3 that is generated like this? It really matters that exons are
'part_of' and CDSs are 'product_of' mRNAs. I am trying to decide if
this should be done when the GFF3 is produced, or when the GFF3 is
loaded to the database. Any suggestions?
Thanks,
Scott
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Bioperl-l
mailing list