[Bioperl-l] RE: [Biojava-l] Writing genbank files

Dickson, Mike mdickson@netgenics.com
Fri, 18 Jan 2002 18:55:35 -0500


The locus line format did change recently. See the NCBI site for details.  I
thought a patch made it into the BioJava code for this.  Which version of
BioJava are you using?

Mike

> -----Original Message-----
> From: David Waring [mailto:dwaring@u.washington.edu]
> Sent: Friday, January 18, 2002 5:54 PM
> To: Bioperl; biojava
> Subject: [Biojava-l] Writing genbank files
> 
> 
> I have come across a problem with genbank files using the perl module
> Bio::DB::GenBank. When I get the genbank sequence from NCBI 
> and write the
> sequence out to in genbank format the Locus line is missing the date.
> 
> LOCUS       AC104722    24949 bp    DNA             linear       BCT
> 
> instead of
> 
> LOCUS       AC104722    24949 bp    DNA             linear       BCT
> 21-DEC-2001
> 
> which is what I get when I download the file myself. I don't 
> know if this
> represents a problem in reading the reading the file or 
> writing the file.
> 
> Why am I cross-posting this to biojava???. Well the biojava 
> parser dies on
> such a file with a message that says that the Locus line is too short.
> 
> Is the date a required element in the Locus line? Is there 
> consensus on what
> constitutes correct format? Has it changed recently?
> 
> David
> 
> 
> 
> I also noticed that the biojava parser is very picky about 
> the number of
> spaces; delete a few spaces between DNA and linear and it dies too.
> 
> 	Exception in thread "main" 
> org.biojava.bio.seq.io.ParseException: LOCUS
> line too
> 	 short [LOCUS       AC104719    17453 bp    DNA         
>   linear       BCT
> 21-DE
> 	C-2001]
>   	      at
> org.biojava.bio.seq.io.GenbankContext.parseLocusLinePost127(GenbankFo
> 	rmat.java, Compiled Code)
>       	  at
> org.biojava.bio.seq.io.GenbankContext.processHeaderLine(GenbankFormat
> 	.java, Compiled Code)
>  	       at
> org.biojava.bio.seq.io.GenbankContext.processLine(GenbankFormat.java,
> 	 Compiled Code)
> 	        at
> org.biojava.bio.seq.io.GenbankFormat.readSequence(GenbankFormat.java,
> 	 Compiled Code)
> 	rethrown as org.biojava.bio.BioException: Could not 
> read sequence
> 	        at
> org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java, C
> 	ompiled Code)
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>