[Bioperl-l] bp_genbank2gff3.pl

Eric Just e-just at northwestern.edu
Fri Jan 26 17:08:49 UTC 2007


Hi there,

I am getting some strange results with bp_genbank2gff3.pl.  I have a
source genbank file with mulitple records.  I would like to have all
of my mRNA features parsed into

mRNA
  CDS
  exon

strutctures and the tRNA features parsed int

tRNA
   exon

structures in the GFF3 file.

I am calling the script like this:

perl %xampp_root%/perl/bin/bp_genbank2gff3.pl --filter misc_feature
--filter repeat_region --nolump genbank_data/test.small.gb

Everything appears to run OK, no errors, however in my output I have
mysterious missing exon features.   Most of the mRNAs get parsed as
mRNA/CDS/exon but some are missing one or more exon features.   The
problem seems to get worse the more records there are in the genbank
source file.

For example, the following portion of the genbank file:


     gene            <5948..>6982
                     /locus_tag="4.t00046"
                     /Name="4.t00046"
     mRNA            5948..6982
                     /db_xref="GI:56474408"
                     /locus_tag="4.t00046"
                     /codon_start=1
                     /protein_id="EAL51779.1"
                     /product="ubiquitin-conjugating enzyme, putative"
     CDS             5948..6982
                     /locus_tag="4.t00046"

gets written as:

AAFB01000019	GenBank	gene	5948	6982	.	+	.	iD=4.t00046;locus_tag=4.t00046;Name=4.t00046
AAFB01000019	GenBank	mRNA	5948	6982	.	+	.	iD=4.t00046.t01;Parent=4.t00046;db_xref=GI:56474408;locus_tag=4.t00046;codon_start=1;protein_id=EAL51779.1;product=ubiquitin-conjugating
enzyme%2C putative
AAFB01000019	GenBank	CDS	5948	6982	.	+	.	Parent=4.t00046.t01;locus_tag=4.t00046

Whereas most of the other mRNA features have exon features.  I notice
the same problem with tRNAs missing exon features.

When if I parse the single GenBank record, it works fine, it seems to
be a problem parsing a single file with multiple GenBank records.
Any idea what's going wrong or what I can do to help trouble shoot?
Attached is my source GenBank file.

Thanks a lot!
Eric
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.small.gb.gz
Type: application/x-gzip
Size: 175413 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070126/c545fa4e/attachment-0004.gz>


More information about the Bioperl-l mailing list