[Bioperl-l] GenBank files CONTIG line
lairdm at sfu.ca
Tue Sep 16 21:16:59 UTC 2014
I wanted to report what I think is an issue but I'm not positive yet. I
found this old mailing list posting from May
about the changes to NCBI's genbank files, and I just grabbed the latest
bioperl live with August's patch to hopefully solve it. That part
worked great, instead of spewing a few GB of warns and the whole
sequence multiple times it read the genbank file and wrote out an embl
file perfectly fine.
However the current bioperl live created a new issue. I have a mirror
of NCBI's bacterial genomes directory (yes, I know, I need to move to
the new directory structure in the next 6 months) and this pipeline
takes the genbank file and makes the embl, ptt, faa, and fna as needed.
This usually takes seconds. Whatever changed in bioperl live compared
to BioPerl 1.6.922 causes the script to spin doing something very
intensely for tens of minutes, slowly writing out the ptt file.
Simply copying genbank.pm from bioperl live to my install directory
solved both the CONTIG issue and kept the whole conversion process
speedy. So I'm happy for now, but I wanted to mention this in case it
rings a bell with anyone on what could have changed to make parsing a
gbk in to a ptt so much less efficient now.
Lead Software Developer, Bioinformatics
Simon Fraser University, Burnaby, BC, Canada
More information about the Bioperl-l