[Bioperl-l] Writing genbank files
David Waring
dwaring@u.washington.edu
Fri, 18 Jan 2002 14:54:05 -0800
I have come across a problem with genbank files using the perl module
Bio::DB::GenBank. When I get the genbank sequence from NCBI and write the
sequence out to in genbank format the Locus line is missing the date.
LOCUS AC104722 24949 bp DNA linear BCT
instead of
LOCUS AC104722 24949 bp DNA linear BCT
21-DEC-2001
which is what I get when I download the file myself. I don't know if this
represents a problem in reading the reading the file or writing the file.
Why am I cross-posting this to biojava???. Well the biojava parser dies on
such a file with a message that says that the Locus line is too short.
Is the date a required element in the Locus line? Is there consensus on what
constitutes correct format? Has it changed recently?
David
I also noticed that the biojava parser is very picky about the number of
spaces; delete a few spaces between DNA and linear and it dies too.
Exception in thread "main" org.biojava.bio.seq.io.ParseException: LOCUS
line too
short [LOCUS AC104719 17453 bp DNA linear BCT
21-DE
C-2001]
at
org.biojava.bio.seq.io.GenbankContext.parseLocusLinePost127(GenbankFo
rmat.java, Compiled Code)
at
org.biojava.bio.seq.io.GenbankContext.processHeaderLine(GenbankFormat
.java, Compiled Code)
at
org.biojava.bio.seq.io.GenbankContext.processLine(GenbankFormat.java,
Compiled Code)
at
org.biojava.bio.seq.io.GenbankFormat.readSequence(GenbankFormat.java,
Compiled Code)
rethrown as org.biojava.bio.BioException: Could not read sequence
at
org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java, C
ompiled Code)