e-just at northwestern.edu
Fri Jan 26 17:08:49 UTC 2007
I am getting some strange results with bp_genbank2gff3.pl. I have a
source genbank file with mulitple records. I would like to have all
of my mRNA features parsed into
strutctures and the tRNA features parsed int
structures in the GFF3 file.
I am calling the script like this:
perl %xampp_root%/perl/bin/bp_genbank2gff3.pl --filter misc_feature
--filter repeat_region --nolump genbank_data/test.small.gb
Everything appears to run OK, no errors, however in my output I have
mysterious missing exon features. Most of the mRNAs get parsed as
mRNA/CDS/exon but some are missing one or more exon features. The
problem seems to get worse the more records there are in the genbank
For example, the following portion of the genbank file:
/product="ubiquitin-conjugating enzyme, putative"
gets written as:
AAFB01000019 GenBank gene 5948 6982 . + . iD=4.t00046;locus_tag=4.t00046;Name=4.t00046
AAFB01000019 GenBank mRNA 5948 6982 . + . iD=4.t00046.t01;Parent=4.t00046;db_xref=GI:56474408;locus_tag=4.t00046;codon_start=1;protein_id=EAL51779.1;product=ubiquitin-conjugating
AAFB01000019 GenBank CDS 5948 6982 . + . Parent=4.t00046.t01;locus_tag=4.t00046
Whereas most of the other mRNA features have exon features. I notice
the same problem with tRNAs missing exon features.
When if I parse the single GenBank record, it works fine, it seems to
be a problem parsing a single file with multiple GenBank records.
Any idea what's going wrong or what I can do to help trouble shoot?
Attached is my source GenBank file.
Thanks a lot!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 175413 bytes
Desc: not available
More information about the Bioperl-l