[Biojava-dev] NullPointerException from BlastSAXParser.java

Sicotte, Hugues (NIH/NCI) sicotteh at mail.nih.gov
Fri Oct 7 13:27:01 EDT 2005


I've been through this before when I was working for
NCBI.

The answer was that the text output of BLAST was never a supported format.
The only supported format is the XML Blast Output.
http://ccgb.umn.edu/~crow/projects/xmlblast/example.html

also

In the case of parsing multiple blast files,
breaking on "Searching..." is not a good idea
because if the parameters are wrong or the query sequence
too low complexity, this String is not emitted by the program.


Hugues Sicotte


-----Original Message-----
From: W. Eric Trull [mailto:wetrull at yahoo.com]
Sent: Friday, October 07, 2005 12:05 PM
To: biojava-dev at biojava.org
Cc: mark.schreiber at novartis.com
Subject: Re: [Biojava-dev] NullPointerException from BlastSAXParser.java


Should I raise this as an issue with NCBI?  Seems like it makes writting
parsing routines more difficult.

Thanks.

-Eric Trull

--- mark.schreiber at novartis.com wrote:

> Looks like there might be a difference in the Windows output. I will try 
> to take a look at this over the next few days. Probably need to change the

> BlastSAXParser to look for something other than Searching so that this 
> will get parsed as well.
> 
> - Mark
> 
> 
> 
> 
> 
> "W. Eric Trull" <wetrull at yahoo.com>
> 10/06/2005 11:01 PM
> 
>  
>         To:     biojava-dev at biojava.org
>         cc:     Mark Schreiber/GP/Novartis at PH
>         Subject:        Re: [Biojava-dev] NullPointerException from
> BlastSAXParser.java
> 
> 
> Hello Mark,
> 
> Here is what I've done, using NCBI Blast 2.0.11, Windows XP, JDK 1.4.2
> 
> 1.  Downloaded the PDB's pdb_seqres.txt
> 2.  Created a blast database (after changing the deflines):
>         C:\blast-2.0.11\formatdb.exe
>             -t "PDB" 
>             -i blast\pdb_seqres.txt
>             -l blast\pdb_formatdb.log
>             -o T
>             -n blast\pdb
> 3.  BLASTed 26SPS9_Hs:
>         C:\blast-2.0.11\blastall.exe
>             -p blastp
>             -d blast\pdb
>             -i 26SPS9_Hs.fasta
>             -o 26SPS9_Hs.blast
> 4.  Tried to parse 26SPS9_Hs.blast using the class shown in BioJava in 
> Anger
> and BlastEcho, both of which give me the NullPointerException.  The 
> beginning
> of 26SPS9_Hs.blast file is shown below, the entire file is attached. 
> 
> Please let me know if you see anything obviously wrong with the way I'm 
> doing
> the BLAST.  I'm going to cvs checkout the BioJava source code and have a 
> look
> at the JUnit test later today.
> 
> Thanks!
> 
> -Eric Trull
> 
> -------- 26SPS9_Hs.blast --------
> BLASTP 2.0.11 [Jan-20-2000]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= 26SPS9_Hs 
>          (176 letters)
> 
> Database: PDB
>            78,094 sequences; 17,596,117 total letters
> 
> 
> 
>                                                                    Score  
> E
> Sequences producing significant alignments:                        (bits) 
> Value
> 
> pdb|1UFM|A Cop9 Complex Subunit 4                                      39 
> 0.003
> .
> .
> .
> -------- 26SPS9_Hs.blast --------
> 
> 
> --- mark.schreiber at novartis.com wrote:
> 
> > Hello -
> > 
> > This is very odd.
> > 
> > The JUnit tests currently pass using the files in 
> > /tests/files/org/biojava/bio/programs/ssbind  These BLAST files all have

> 
> > the string "Searching....". Maybe there is a variation in the windows 
> > output?
> > 
> > Can you post at least the header of your output to the list (preferably 
> an 
> > entire example output)?
> > 
> > - Mark
> > 
> > 
> > 
> > 
> > 
> > "W. Eric Trull" <wetrull at yahoo.com>
> > Sent by: biojava-dev-bounces at portal.open-bio.org
> > 10/06/2005 06:11 AM
> > 
> > 
> >         To:     biojava-dev at biojava.org
> >         cc:     (bcc: Mark Schreiber/GP/Novartis)
> >         Subject:        [Biojava-dev] NullPointerException from
> > BlastSAXParser.java
> > 
> > 
> > Hello all,
> > 
> > I'm new to the list, but have done as much archive searching, Google
> > searching, and debugging as I can on the problem I describe here.
> > 
> > I'm trying to parse NCBI BLAST output (as shown in BioJava in Anger), 
> but
> > keep getting a NullPointerException.  One of my searches turned up using
> > BlastEcho to debug the problem, but that also throws the
> > NullPointerException:
> > 
> > startSearch
> >                  SearchProp:             program: ncbi-blastp
> >                  SearchProp:             version: 2.0.11
> > java.lang.NullPointerException
> >                  at
> >
>
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215
)
> >                  at 
> > 
> org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
> >                  at
> >
>
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars
er.java:311)
> >                  at
> >
>
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.
java:274)
> >                  at
> >
>
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java
:160)
> >                  at 
> > com.pfizer.search.sequence.BlastEcho.echo(BlastEcho.java:42)
> >                  at 
> > com.pfizer.search.sequence.BlastEcho.main(BlastEcho.java:88)
> > Exception in thread "main" 
> > 
> > Stepping through the code in a debugger shows that the while loop added 
> in
> > revision 1.13 of
> > /biojava-live/src/org/biojava/bio/program/sax/BlastSAXParser.java (fixed
> > truncation of database id) reads all the lines without ever matching the
> > "Searching" string.  At first I thought it was because I was using a 
> later
> > version of BLAST, but then I tried 2.0.11 and 2.2.3 (supported version) 
> > but
> > they also result in a NullPointerException.  In the BLAST output for the
> > various versions I never see a "Searching" string anywhere.  I've tried 
> > all
> > the -m options as well, without success.
> > 
> > Is there a NCBI BLAST option that I need to be using?  I'm running on 
> > Windows
> > XP (during development) - is the UNIX version output different? 
> > 
> > Thanks.
> > 
> > -Eric Trull
> > 
> > 
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at biojava.org
> > http://biojava.org/mailman/listinfo/biojava-dev
> > 
> > 
> > 
> > 
> BLASTP 2.0.11 [Jan-20-2000]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= 26SPS9_Hs 
>          (176 letters)
> 
> Database: PDB
>            78,094 sequences; 17,596,117 total letters
> 
> 
> 
> 
=== message truncated ===

_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev


More information about the biojava-dev mailing list