[Biojava-l] Genbank parser error [biojavax]

mark.schreiber at novartis.com mark.schreiber at novartis.com
Mon Feb 13 20:11:07 EST 2006


Hi Morgane -

I have to say that doesn't look much like Genbank : )

The biojavax parser are possibly a bit brittle due to their use of regexps 
to recognize key elements. It should be fixable, I think the problem is 
that the parser expects a word after LOCUS not a number. This may not be 
the only problem though. Could you post the entire file? Or if it is large 
then a representative file of smaller size.

- Mark





Morgane THOMAS-CHOLLIER <mthomasc at vub.ac.be>
Sent by: biojava-l-bounces at portal.open-bio.org
02/14/2006 04:36 AM

 
        To:     biojava-l at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Genbank  parser error [biojavax]


Hello,

I have tried biojavax today with a view to use the Genbank file parser.

My test file is a Genbank formatted file which has been produced by 
Ensembl export system.

The head of the file is as follow :

LOCUS       6 489671 bp DNA HTG 13-FEB-2006
DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
            52296503..52786173 reannotated via EnsEMBL
ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
VERSION     chromosome:NCBIM34:6:52296503:52786173:1

I used the code provided in biojavax docbook to parse this file.
I get the following error :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 
6 489671 bp DNA HTG 13-FEB-2006
    at 
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

I had a look at GenbankFormat.java, and I guess the problem comes from 
the regular expression that do not recognize the LOCUS as a standard 
Genbank file LOCUS tag.

Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
exported files ?

Morgane.

-- 
*************************************
Morgane THOMAS-CHOLLIER, PHD Student 

Vrije Universiteit Brussels (VUB) 
Laboratory of Cell Genetics 
Pleinlaan 2 
1050 Brussels 
Belgium 


_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list