[Biojava-l] GenBank XML File Parse Error

Toralf Kirsten tkirsten at izbi.uni-leipzig.de
Fri Jan 23 12:17:05 EST 2004


Thomas,
thanks for the your answer.
ASCII plain text or normal flat file as you said is downloadable from 
the NCBI web page.
So there is no problem to use it. But we would like to use XML file, due 
to each term is accessible at atomic level.
Thanks again.
Toralf

Thomas Down wrote:

>Once upon a time, Toralf Kirsten wrote:
>  
>
>>Hi,
>>I have to extract data from the GenBank XML files.
>>For this purpose I use the biojava API. But I get a parser error.
>>
>>java.lang.StringIndexOutOfBoundsException: String index out of range: 12
>>at java.lang.String.substring(String.java:1477)
>>at org.biojava.bio.seq.io.GenbankContext.processHeaderLine
>>(GenbankContext.java:621)
>>[snip]
>>
>>
>>The program is just simple. The user specifies path and file name by the
>>FileChooser component. Then I open the file and apply the Sequence and
>>Annotation classes as visible in the attached method taken from a extended
>>file class.
>>
>>What I need are the sequence data of the GenBank entry (accession,
>>sequence etc.)
>>and also for its features (start, end position, subtype like t-RNA, cds
>>etc.)
>>    
>>
>
>I'm afraid that BioJava doesn't currently support the XML version
>of genbank records.  The Genbank parser you are using expects the
>normal flatfile version of the genbank records -- do you have
>access to these?
>
>We should probably look at adding Genbank XML support to BioJava.
>Does anyone know how widely it's used (I must admit I haven't met
>it before).
>
>    Thomas.
>  
>


More information about the Biojava-l mailing list