[Bioperl-l] Parsing GenBank files w/o Sequences

Peter Chines pchines at nhgri.nih.gov
Sat Feb 1 11:42:42 EST 2003


Hi,

I'm new to Bioperl (wrote my first script two days ago), so I may be
trying to do something the wrong way.  I want to read GenBank files that
have all of the sequence features annotated, but don't actually have any
sequence, e.g. files like
ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/H_sapiens/CHR_22/hs_chr22.gbs.gz

Thanks to the wiki pages, I quickly hit on Bio::SeqIO and the sample
code to read GenBank records.  Unfortunately, when there are multiple
entries in a GenBank file, but no sequence for the records (no ORIGIN
section), the parser skips over every other entry without any kind of
warning message.

Am I correct in thinking that Bio::SeqIO;;genbank should handle this
type of GenBank record?  If so, I have a patch and some new tests to
contribute--I'll send them to Elia unless someone else wants them.  If
Bio::SeqIO;;genbank is not the best way to deal with these, please point
me toward the correct modules to use, and perhaps I'll look for a good
place to document this.

Thanks,
Peter



More information about the Bioperl-l mailing list