[Biojava-l] StringIndexOutOfBoundsException while parsing blast result

Mark Schreiber markjschreiber at gmail.com
Wed Oct 1 06:07:51 UTC 2008


Actually, if it is an OS specific carriage return then there is still
a minor issue. We should really try and code stuff so that it can
handle files that originate from any major OS.

- Mark

On Wed, Oct 1, 2008 at 12:31 AM, Richard Holland
<holland at eaglegenomics.com> wrote:
>
> Sounds like it _might_ be something to do with the carriage return
> itself. Is the blast file generated on the same OS that you're running
> your analysis on? (e.g. you might run Blast on a Linux box, but
> attempt to parse the file on a Windows box?). If the two OSes are
> different, this might point to it - as Linux won't necessarily
> understand the Windows linebreaks, or vice versa, and might
> misinterpret them. When you copy the portion of the file to a new file
> on the OS you're running the analysis on, it will substitute its own
> local linebreaks and thus mask the problem.
>
> So the first thing I'd check is to what the two OSes involved are. If
> they're different, try running your analysis program on the same OS as
> the Blast output was generated on. If that does fix it, then try
> putting your Blast files through dos2unix or something similar to
> convert the linebreaks before running your analysis program.
>
> If they're the same OS, then we still have a problem!
>
> cheers,
> Richard
>
> 2008/9/30 David Toomey <dtoomey at rcsi.ie>:
> > Hi
> >
> >
> >
> > I am parsing a blast result and I am getting a
> > StringIndexOutOfBoundsException. The stack trace is
> >
> >
> >
> >        at java.lang.String.substring(String.java:1938)
> >
> >        at java.lang.String.substring(String.java:1905)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA
> > lignmentSAXParser.java:291)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign
> > mentSAXParser.java:116)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP
> > arser.java:517)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP
> > arser.java:287)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse
> > r.java:251)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja
> > va:117)
> >
> >        at
> > org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser
> > .java:634)
> >
> >        at
> > org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:341
> > )
> >
> >        at
> > org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:168)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars
> > er.java:314)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.
> > java:276)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java
> > :163)
> >
> >        at ie.rcsi.blast.StandardParser.parse(StandardParser.java:65)
> >
> >        at ie.rcsi.blast.BlastParser.parse(BlastParser.java:44)
> >
> >        at ie.rcsi.blast.Main.main(Main.java:30)
> >
> >
> >
> > I have updated BlastLikeAlignmentSAXParser to output some debug info and
> > narrowed down the line causing the problem to the following line
> >
> >
> >
> > 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
> >
> > GN=ISPF
> >
> >
> >
> > If I remove the carriage return and put it on a single line then everything
> > works fine. Strangely if I copy this entry and put it in a file on it's own
> > it also parses correctly, even with the carriage return!!!
> >
> >
> >
> > Has anyone seen this before or does anyone have a suggestion on what I might
> > to do fix it. I send the complete blast result if it would help. I have
> > tried using blast 2.2.18 and 2.2.17 and the problem is the same.
> >
> >
> >
> > Cheers
> >
> >
> >
> > Dave
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list