[Biojava-l] StringIndexOutOfBoundsException while parsing blast result

Richard Holland holland at eaglegenomics.com
Tue Sep 30 16:31:17 UTC 2008


Sounds like it _might_ be something to do with the carriage return
itself. Is the blast file generated on the same OS that you're running
your analysis on? (e.g. you might run Blast on a Linux box, but
attempt to parse the file on a Windows box?). If the two OSes are
different, this might point to it - as Linux won't necessarily
understand the Windows linebreaks, or vice versa, and might
misinterpret them. When you copy the portion of the file to a new file
on the OS you're running the analysis on, it will substitute its own
local linebreaks and thus mask the problem.

So the first thing I'd check is to what the two OSes involved are. If
they're different, try running your analysis program on the same OS as
the Blast output was generated on. If that does fix it, then try
putting your Blast files through dos2unix or something similar to
convert the linebreaks before running your analysis program.

If they're the same OS, then we still have a problem!

cheers,
Richard

2008/9/30 David Toomey <dtoomey at rcsi.ie>:
> Hi
>
>
>
> I am parsing a blast result and I am getting a
> StringIndexOutOfBoundsException. The stack trace is
>
>
>
>        at java.lang.String.substring(String.java:1938)
>
>        at java.lang.String.substring(String.java:1905)
>
>        at
> org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA
> lignmentSAXParser.java:291)
>
>        at
> org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign
> mentSAXParser.java:116)
>
>        at
> org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP
> arser.java:517)
>
>        at
> org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP
> arser.java:287)
>
>        at
> org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse
> r.java:251)
>
>        at
> org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja
> va:117)
>
>        at
> org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser
> .java:634)
>
>        at
> org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:341
> )
>
>        at
> org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:168)
>
>        at
> org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars
> er.java:314)
>
>        at
> org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.
> java:276)
>
>        at
> org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java
> :163)
>
>        at ie.rcsi.blast.StandardParser.parse(StandardParser.java:65)
>
>        at ie.rcsi.blast.BlastParser.parse(BlastParser.java:44)
>
>        at ie.rcsi.blast.Main.main(Main.java:30)
>
>
>
> I have updated BlastLikeAlignmentSAXParser to output some debug info and
> narrowed down the line causing the problem to the following line
>
>
>
> 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
>
> GN=ISPF
>
>
>
> If I remove the carriage return and put it on a single line then everything
> works fine. Strangely if I copy this entry and put it in a file on it's own
> it also parses correctly, even with the carriage return!!!
>
>
>
> Has anyone seen this before or does anyone have a suggestion on what I might
> to do fix it. I send the complete blast result if it would help. I have
> tried using blast 2.2.18 and 2.2.17 and the problem is the same.
>
>
>
> Cheers
>
>
>
> Dave
>
>
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/



More information about the Biojava-l mailing list