[Bioperl-l] SeqIO::genbank

Jason Stajich jason@chg.mc.duke.edu
Thu, 13 Jul 2000 17:43:17 -0400 (EDT)


Still not sure what all is broken in genbank parser, but haven't spent an
enormous amount of time with it.

Perhaps a Bioperl discussion at BOSC should include an assesment and a
wishlist of where we want the SeqIOs to be in the next iteration of
bioperl.

A difference between 0.6.1 and the live version is 

t/test.genbank
----------------------------
revision 1.3
date: 2000/05/12 17:27:12;  author: jgrg;  state: Exp;  lines: +4 -0
fix to Bio/SeqIO/genbank.pm read_GenBank_Species
----------------------------
revision 1.2
date: 2000/05/07 17:22:49;  author: lapp;  state: Exp;  lines: +186 -0
Added feature-rich entry and an entry with a "weird" location feature.
----------------------------


As for the Featureless Genbank files, the only reason I noticed this was,
I have a contig assembly program which we use, and it produced GenBank
file w/o Features, as the following:

LOCUS       Contig[0071]       712 bp
DEFINITION  Contig[0071], 712 bases, 3069 checksum.
ORIGIN      
       1  AAACAATTTC ACACAAGAAA CAGCTATGAN CATGATTACG AATTCGAGCT
      51  CGGTACCCAG CTTTAACAAC CACGTGCGCA CGCTTGTGGC GCGCGAGGAG
...

So I agree that it doesn't follow the GenBank standard, but not sure what
else to do if another program is producing it, and I'd like to be able to
read it.  I'm leaving the fix in the code, since it doesn't change the
parser very much, but feel free to correct me if you think it
is a mistake to handle it this way. 

-Jason

Jason Stajich
jason@chg.mc.duke.edu
http://galton.mc.duke.edu/~jason/
(919)684-1806 (office) 
(919)684-2275 (fax) 
Center for Human Genetics - Duke University Medical Center
http://wwwchg.mc.duke.edu/