[Biojava-l] GenBank XML File Parse Error

Thomas Down thomas at derkholm.net
Fri Jan 23 12:01:51 EST 2004


Once upon a time, Toralf Kirsten wrote:
> Hi,
> I have to extract data from the GenBank XML files.
> For this purpose I use the biojava API. But I get a parser error.
> 
> java.lang.StringIndexOutOfBoundsException: String index out of range: 12
> at java.lang.String.substring(String.java:1477)
> at org.biojava.bio.seq.io.GenbankContext.processHeaderLine
> (GenbankContext.java:621)
> [snip]
> 
> 
> The program is just simple. The user specifies path and file name by the
> FileChooser component. Then I open the file and apply the Sequence and
> Annotation classes as visible in the attached method taken from a extended
> file class.
> 
> What I need are the sequence data of the GenBank entry (accession,
> sequence etc.)
> and also for its features (start, end position, subtype like t-RNA, cds
> etc.)

I'm afraid that BioJava doesn't currently support the XML version
of genbank records.  The Genbank parser you are using expects the
normal flatfile version of the genbank records -- do you have
access to these?

We should probably look at adding Genbank XML support to BioJava.
Does anyone know how widely it's used (I must admit I haven't met
it before).

    Thomas.


More information about the Biojava-l mailing list