[Biojava-l] GenBank XML File Parse Error
Toralf Kirsten
tkirsten at izbi.uni-leipzig.de
Fri Jan 23 12:17:05 EST 2004
Thomas,
thanks for the your answer.
ASCII plain text or normal flat file as you said is downloadable from
the NCBI web page.
So there is no problem to use it. But we would like to use XML file, due
to each term is accessible at atomic level.
Thanks again.
Toralf
Thomas Down wrote:
>Once upon a time, Toralf Kirsten wrote:
>
>
>>Hi,
>>I have to extract data from the GenBank XML files.
>>For this purpose I use the biojava API. But I get a parser error.
>>
>>java.lang.StringIndexOutOfBoundsException: String index out of range: 12
>>at java.lang.String.substring(String.java:1477)
>>at org.biojava.bio.seq.io.GenbankContext.processHeaderLine
>>(GenbankContext.java:621)
>>[snip]
>>
>>
>>The program is just simple. The user specifies path and file name by the
>>FileChooser component. Then I open the file and apply the Sequence and
>>Annotation classes as visible in the attached method taken from a extended
>>file class.
>>
>>What I need are the sequence data of the GenBank entry (accession,
>>sequence etc.)
>>and also for its features (start, end position, subtype like t-RNA, cds
>>etc.)
>>
>>
>
>I'm afraid that BioJava doesn't currently support the XML version
>of genbank records. The Genbank parser you are using expects the
>normal flatfile version of the genbank records -- do you have
>access to these?
>
>We should probably look at adding Genbank XML support to BioJava.
>Does anyone know how widely it's used (I must admit I haven't met
>it before).
>
> Thomas.
>
>
More information about the Biojava-l
mailing list