[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output

Hubert Prielinger hubert.prielinger at gmx.at
Thu Feb 9 22:20:46 UTC 2006


Hi Chris,
I'm incredibly sorry for causing so much inconvenience, yes you are 
right, I had only to change the blast.pm file, it is working very fine, 
thank you very much, and you are right, you have mentioned it ealier 
either to change the file... ;)

but I have another question: does it work with the WU-Blast output too? 

regards
Hubert


Chris Fields wrote:

>Ha!  I come back from meeting and there's a billion emails!  What have we
>started? ;p .  Sorry about this Jason; I know you're busy.
>
>Hubert, if you're out there, I sent you an email with an attachment.  You
>said the output looks like what you were expecting.  So I think we have two
>problems:
>
>1)  I haven't delved into the file scanning, but the fact that it takes so
>long should tell you something's seriously wrong there.  Strip that part out
>and start with a simple script, say, like the one Jason or that I sent you;
>the script I used to generate that output works fine (on two OS's, WinXP and
>Mac OS X).  Use it on one file at a time.  Do everything on command line
>(not through Eclipse).  IDE's can be notoriously flaky about running
>scripts, esp. when they run debugging.  
>
>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast will still
>not work whenever the text blast output has the following header, which
>comes from the new web version of BLAST:
>
>-----------------------------------------------------
>BLASTP 2.2.13 [Nov-27-2005]
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman 
>(1997), "Gapped BLAST and PSI-BLAST: a new generation of 
>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>
>RID: 1139501210-857-165793005128.BLASTQ1
>
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           3,292,813 sequences; 1,128,164,434 total letters
>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis 
>H37Rv].
>Length=193
>.......
>-----------------------------------------------------
>
>It will work if the text output has the following header (or is an older
>version of BLAST):
>
>-----------------------------------------------------
>BLASTP 2.2.12 [Aug-07-2005]
>
>
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
>"Gapped BLAST and PSI-BLAST: a new generation of protein database search
>programs",  Nucleic Acids Res. 25:3389-3402.
>
>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis H37Rv].
>         (193 letters)
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           2,895,325 sequences; 997,103,285 total letters
>-----------------------------------------------------
>You have the former (2.2.13) version.  I know b/c I have your BLAST files.
>Therefore, even bioperl-1.5.1 will not work!
>
>If you want the really gory details on why this is a problem, look here:
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>So, any text output with the above header will not work; it will either hang
>or end abruptly (depending on OS, perl version, memory, patience).  If you
>look in the above, I have added a preliminary fix for this.  I'll reiterate
>for the billionth time, it hasn't been committed yet, so don't kill me if
>blows your computer up ;>   
>
>Here's the direct link:
>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>This is a modified version of Bio::SearchIO::blast.pm (it says it's version
>1.90, but it's lying, I didn't change the version, only the regex; sorry
>Jason).  From what you've been posting it doesn't sound like you've tried
>this, and I believe I've suggested this fix before.
>
>Replace the one in your Bio/SearchIO directory (which looks like
>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your prev.
>message) with this file.  Make sure the filename stays the same (blast.pm).
>
>Run everything again, one file at a time.  Make sure you use Jason's script
>as well as the one I sent you.  Do NOT rely on running through multiple
>files yet.  Fix one bug at a time.  And heed Joel's words about file checks.
>
>
>Here's a small chunk of output from one of your blast files using the
>modifed script I sent you:
>
>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>Query:   1  RWKWKRKK  8
>Seq:     542  RWAWRRKK  549
>
>Look familiar?
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Roger Hall [mailto:rahall2 at ualr.edu] 
>>Sent: Thursday, February 09, 2006 3:24 PM
>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>In other words, yes, I'm on the wrong trail. :}
>>
>>Sorry - I'll look at the output issue this evening (or 
>>realize that Chris already solved the issue).  ;}
>>
>>Thanks!
>>
>>Roger
>>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>Hubert Prielinger
>>Sent: Thursday, February 09, 2006 2:14 PM
>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; 
>>Jason Stajich
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>dear roger,
>>this error message I got, when I tried to parse Blast output (version
>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I 
>>have a lot of Blast output files with version 2.2.13 and for 
>>that I don't get any error message.....it just doesn't work
>>
>>Hubert
>>
>>
>>
>>Roger Hall wrote:
>>
>>    
>>
>>>Guys - I'm looking at the error message:
>>>
>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>STACK Bio::SearchIO::blast::next_result
>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>STACK toplevel
>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>This is my line of thought:
>>>1. "no data for midline $_" is a unique message generated by 
>>>      
>>>
>>blast.pm 
>>    
>>
>>>in
>>>      
>>>
>>one
>>    
>>
>>>location only at the point of a. reading three lines b. 
>>>      
>>>
>>dropping lines 
>>    
>>
>>>with spaces only c. identifying the Query, Midline, and 
>>>      
>>>
>>Match lines (0 
>>    
>>
>>><= $i <
>>>      
>>>
>>3)
>>    
>>
>>>2. There is a regexp match that fails in order to reach that 
>>>      
>>>
>>error message
>>    
>>
>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the 
>>>      
>>>
>>expression
>>    
>>
>>>4. It does anyway
>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere 
>>>      
>>>
>>in the blast
>>    
>>
>>>reports
>>>
>>>I suspect a newline/chomp/metacharacter issue. Not finding 
>>>      
>>>
>>the string 
>>    
>>
>>>anywhere has me thoroughly confused - I asked Hubert for the 
>>>      
>>>
>>additional 
>>    
>>
>>>file, assuming that I didn't have it.
>>>
>>>My next thought is to write a quick script to test perl behavior on 
>>>"Fedora Core 9".
>>>
>>>Thoughts?
>>>
>>>Did I misread the issue entirely? :}
>>>
>>>Roger
>>>
>>>
>>>-----Original Message-----
>>>From: bioperl-l-bounces at lists.open-bio.org
>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>>      
>>>
>>Chris Fields
>>    
>>
>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>Cc: bioperl-l at bioperl.org
>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>Blast output
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>To: Hubert Prielinger
>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>>Blast output
>>>>
>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>   
>>>>
>>>>        
>>>>
>>>>>hi chris,
>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>working,
>>>>   
>>>>
>>>>        
>>>>
>>>>>do you have any ohter idea, the problem I have is that I
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>have to parse
>>>>   
>>>>
>>>>        
>>>>
>>>>>a lot of textfiles....
>>>>>or shall I look for another option to parse those files...
>>>>>
>>>>>regards
>>>>>Hubert
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>2.2.13 reports but unless you post your blast report we 
>>>>        
>>>>
>>can't really 
>>    
>>
>>>>determine the problem.
>>>>
>>>>If you are still getting the same error like this I am not 
>>>>        
>>>>
>>convinced 
>>    
>>
>>>>you have upgraded to 1.5.1 which includes a fix in the fact 
>>>>        
>>>>
>>that NCBI 
>>    
>>
>>>>changed the HSP result format to remove the ':' from the 
>>>>        
>>>>
>>Query/Sbjct 
>>    
>>
>>>>prefixes.  We fixed this as soon as it was apparent sometime in 
>>>>September.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>If you are just getting no results but also no warnings wrt 
>>>>        
>>>>
>>parsing, 
>>    
>>
>>>>are you sure your logic is correct?
>>>>
>>>>If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>>while (my $result = $search->next_result) {
>>>>    print $result->query_name, "\n";
>>>>    #iterate over each hit on the query sequence
>>>>    while (my $hit = $result->next_hit) {
>>>>	print $hit->name, "\n";
>>>>        #iterate over each HSP in the hit
>>>>        while (my $hsp = $hit->next_hsp) {
>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>        
>>>>
>>>>>hit_string, "\n";	
>>>>>          
>>>>>
>>>>       }
>>>>   }
>>>>}
>>>>   
>>>>
>>>>        
>>>>
>>>I tested some of the BLAST results that Hubert sent Roger 
>>>      
>>>
>>and me with a 
>>    
>>
>>>similar script to the above.  I removed the file parsing logic and it
>>>      
>>>
>>seemed
>>    
>>
>>>to work just fine.  It may very well be a logic issue or 
>>>      
>>>
>>that he hasn't 
>>    
>>
>>>installed the latest fix.
>>>   
>>>It's a funny thing, though.  When I tried using blastcl3 (v. 
>>>      
>>>
>>2.2.13), 
>>    
>>
>>>even though the returned output was from nr, the top of the blast 
>>>output showed that it was v2.2.12:
>>>
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>I double-checked my local version and it's definitely v.2.2.13:
>>>-------------------------------------
>>>C:\Perl\Scripts>blastcl3 -
>>>
>>>blastcl3 2.2.13   arguments:...
>>>-------------------------------------
>>>
>>>If you use RemoteBlast using the same settings, the version in the 
>>>header looks like this:
>>>
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>
>>>I'm wondering if all the blast executables (blast and netblast) from 
>>>NCBI have text output like v.2.2.12, while the wwwblast 
>>>      
>>>
>>outputs a new 
>>    
>>
>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>> 
>>>
>>>      
>>>
>>>>To clarify some stuff -
>>>>Chris I don't necessarily think the XML is best way forward 
>>>>        
>>>>
>>for BLAST 
>>    
>>
>>>>reports generated locally, it isn't as detailed as the Text 
>>>>        
>>>>
>>format and 
>>    
>>
>>>>it is what most people expect to be able to scroll through 
>>>>        
>>>>
>>and parse 
>>    
>>
>>>>-- it is also harder for the format to change dramatically 
>>>>        
>>>>
>>if you have 
>>    
>>
>>>>a static binary on your machine =).  I think for 
>>>>        
>>>>
>>remoteblast the XML 
>>    
>>
>>>>format should be the way forward but I expect Bioperl to maintain 
>>>>support of any plain text BLAST report format that people use on a 
>>>>regular basis.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Does XML lack some specific info that text output has?  
>>>      
>>>
>>Didn't know that.
>>I
>>    
>>
>>>believe that XML should be default in RemoteBlast since it will not 
>>>break, but I agree with you about text output.  I also agree that it 
>>>will need somebody to maintain it constantly, much like RemoteBlast.
>>>
>>> 
>>>
>>>      
>>>
>>>>-jason
>>>>   
>>>>
>>>>        
>>>>
>>>>>Chris Fields wrote:
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>My guess is you're running into text parsing problems in 
>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>(1.5.1) or
>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>>I think the first problem you ran into is solved in 
>>>>>>            
>>>>>>
>>bioperl 1.5.1, 
>>    
>>
>>>>>>the last problem (more recent, not related to the first) has been 
>>>>>>fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>>>>SearchIO::blast is available in the link above, but
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>realize it hasn't
>>>>   
>>>>
>>>>        
>>>>
>>>>>>been committed yet and may change.
>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf 
>>>>>>>              
>>>>>>>
>>Of Hubert 
>>    
>>
>>>>>>>Prielinger
>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>output
>>>>>>>
>>>>>>>Hi,
>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>is that a bug......
>>>>>>>
>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>>>>anything.....
>>>>>>>I'm using bioperl 1.4
>>>>>>>
>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>bioperl version
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>I had installed
>>>>>>>
>>>>>>>thanks in advance
>>>>>>>
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>--
>>>>Jason Stajich
>>>>Duke University
>>>>http://www.duke.edu/~jes12
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>    
>>
>
>
>  
>




More information about the Bioperl-l mailing list