[Biojava-l] Genbank parser error [biojavax]

Morgane THOMAS-CHOLLIER mthomasc at vub.ac.be
Mon Feb 13 15:36:59 EST 2006


Hello,

I have tried biojavax today with a view to use the Genbank file parser.

My test file is a Genbank formatted file which has been produced by 
Ensembl export system.

The head of the file is as follow :

LOCUS       6 489671 bp DNA HTG 13-FEB-2006
DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
            52296503..52786173 reannotated via EnsEMBL
ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
VERSION     chromosome:NCBIM34:6:52296503:52786173:1

I used the code provided in biojavax docbook to parse this file.
I get the following error :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 
6 489671 bp DNA HTG 13-FEB-2006
    at 
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

I had a look at GenbankFormat.java, and I guess the problem comes from 
the regular expression that do not recognize the LOCUS as a standard 
Genbank file LOCUS tag.

Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
exported files ?

Morgane.

-- 
*************************************
Morgane THOMAS-CHOLLIER, PHD Student 

Vrije Universiteit Brussels (VUB)    
Laboratory of Cell Genetics          
Pleinlaan 2                          
1050 Brussels                        
Belgium                              




More information about the Biojava-l mailing list