[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output
Hubert Prielinger
hubert.prielinger at gmx.at
Thu Feb 9 22:20:46 UTC 2006
Hi Chris,
I'm incredibly sorry for causing so much inconvenience, yes you are
right, I had only to change the blast.pm file, it is working very fine,
thank you very much, and you are right, you have mentioned it ealier
either to change the file... ;)
but I have another question: does it work with the WU-Blast output too?
regards
Hubert
Chris Fields wrote:
>Ha! I come back from meeting and there's a billion emails! What have we
>started? ;p . Sorry about this Jason; I know you're busy.
>
>Hubert, if you're out there, I sent you an email with an attachment. You
>said the output looks like what you were expecting. So I think we have two
>problems:
>
>1) I haven't delved into the file scanning, but the fact that it takes so
>long should tell you something's seriously wrong there. Strip that part out
>and start with a simple script, say, like the one Jason or that I sent you;
>the script I used to generate that output works fine (on two OS's, WinXP and
>Mac OS X). Use it on one file at a time. Do everything on command line
>(not through Eclipse). IDE's can be notoriously flaky about running
>scripts, esp. when they run debugging.
>
>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast will still
>not work whenever the text blast output has the following header, which
>comes from the new web version of BLAST:
>
>-----------------------------------------------------
>BLASTP 2.2.13 [Nov-27-2005]
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer,
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman
>(1997), "Gapped BLAST and PSI-BLAST: a new generation of
>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>
>RID: 1139501210-857-165793005128.BLASTQ1
>
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> 3,292,813 sequences; 1,128,164,434 total letters
>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis
>H37Rv].
>Length=193
>.......
>-----------------------------------------------------
>
>It will work if the text output has the following header (or is an older
>version of BLAST):
>
>-----------------------------------------------------
>BLASTP 2.2.12 [Aug-07-2005]
>
>
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>"Gapped BLAST and PSI-BLAST: a new generation of protein database search
>programs", Nucleic Acids Res. 25:3389-3402.
>
>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis H37Rv].
> (193 letters)
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> 2,895,325 sequences; 997,103,285 total letters
>-----------------------------------------------------
>You have the former (2.2.13) version. I know b/c I have your BLAST files.
>Therefore, even bioperl-1.5.1 will not work!
>
>If you want the really gory details on why this is a problem, look here:
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>So, any text output with the above header will not work; it will either hang
>or end abruptly (depending on OS, perl version, memory, patience). If you
>look in the above, I have added a preliminary fix for this. I'll reiterate
>for the billionth time, it hasn't been committed yet, so don't kill me if
>blows your computer up ;>
>
>Here's the direct link:
>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>This is a modified version of Bio::SearchIO::blast.pm (it says it's version
>1.90, but it's lying, I didn't change the version, only the regex; sorry
>Jason). From what you've been posting it doesn't sound like you've tried
>this, and I believe I've suggested this fix before.
>
>Replace the one in your Bio/SearchIO directory (which looks like
>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your prev.
>message) with this file. Make sure the filename stays the same (blast.pm).
>
>Run everything again, one file at a time. Make sure you use Jason's script
>as well as the one I sent you. Do NOT rely on running through multiple
>files yet. Fix one bug at a time. And heed Joel's words about file checks.
>
>
>Here's a small chunk of output from one of your blast files using the
>modifed script I sent you:
>
>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>Query: 1 RWKWKRKK 8
>Seq: 542 RWAWRRKK 549
>
>Look familiar?
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>>-----Original Message-----
>>From: Roger Hall [mailto:rahall2 at ualr.edu]
>>Sent: Thursday, February 09, 2006 3:24 PM
>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>parsing Blast output
>>
>>In other words, yes, I'm on the wrong trail. :}
>>
>>Sorry - I'll look at the output issue this evening (or
>>realize that Chris already solved the issue). ;}
>>
>>Thanks!
>>
>>Roger
>>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Hubert Prielinger
>>Sent: Thursday, February 09, 2006 2:14 PM
>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields;
>>Jason Stajich
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>parsing Blast output
>>
>>dear roger,
>>this error message I got, when I tried to parse Blast output (version
>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I
>>have a lot of Blast output files with version 2.2.13 and for
>>that I don't get any error message.....it just doesn't work
>>
>>Hubert
>>
>>
>>
>>Roger Hall wrote:
>>
>>
>>
>>>Guys - I'm looking at the error message:
>>>
>>>MSG: no data for midline Query 1 WWWKWRW 7
>>>STACK Bio::SearchIO::blast::next_result
>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>STACK toplevel
>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>This is my line of thought:
>>>1. "no data for midline $_" is a unique message generated by
>>>
>>>
>>blast.pm
>>
>>
>>>in
>>>
>>>
>>one
>>
>>
>>>location only at the point of a. reading three lines b.
>>>
>>>
>>dropping lines
>>
>>
>>>with spaces only c. identifying the Query, Midline, and
>>>
>>>
>>Match lines (0
>>
>>
>>><= $i <
>>>
>>>
>>3)
>>
>>
>>>2. There is a regexp match that fails in order to reach that
>>>
>>>
>>error message
>>
>>
>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the
>>>
>>>
>>expression
>>
>>
>>>4. It does anyway
>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere
>>>
>>>
>>in the blast
>>
>>
>>>reports
>>>
>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>
>>>
>>the string
>>
>>
>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>
>>>
>>additional
>>
>>
>>>file, assuming that I didn't have it.
>>>
>>>My next thought is to write a quick script to test perl behavior on
>>>"Fedora Core 9".
>>>
>>>Thoughts?
>>>
>>>Did I misread the issue entirely? :}
>>>
>>>Roger
>>>
>>>
>>>-----Original Message-----
>>>From: bioperl-l-bounces at lists.open-bio.org
>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>
>>>
>>Chris Fields
>>
>>
>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>Cc: bioperl-l at bioperl.org
>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
>>>Blast output
>>>
>>>
>>>
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>To: Hubert Prielinger
>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
>>>>Blast output
>>>>
>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>hi chris,
>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>
>>>>>
>>>>>
>>>>>
>>>>working,
>>>>
>>>>
>>>>
>>>>
>>>>>do you have any ohter idea, the problem I have is that I
>>>>>
>>>>>
>>>>>
>>>>>
>>>>have to parse
>>>>
>>>>
>>>>
>>>>
>>>>>a lot of textfiles....
>>>>>or shall I look for another option to parse those files...
>>>>>
>>>>>regards
>>>>>Hubert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>2.2.13 reports but unless you post your blast report we
>>>>
>>>>
>>can't really
>>
>>
>>>>determine the problem.
>>>>
>>>>If you are still getting the same error like this I am not
>>>>
>>>>
>>convinced
>>
>>
>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>
>>>>
>>that NCBI
>>
>>
>>>>changed the HSP result format to remove the ':' from the
>>>>
>>>>
>>Query/Sbjct
>>
>>
>>>>prefixes. We fixed this as soon as it was apparent sometime in
>>>>September.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>If you are just getting no results but also no warnings wrt
>>>>
>>>>
>>parsing,
>>
>>
>>>>are you sure your logic is correct?
>>>>
>>>>If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>>while (my $result = $search->next_result) {
>>>> print $result->query_name, "\n";
>>>> #iterate over each hit on the query sequence
>>>> while (my $hit = $result->next_hit) {
>>>> print $hit->name, "\n";
>>>> #iterate over each HSP in the hit
>>>> while (my $hsp = $hit->next_hsp) {
>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>
>>>>
>>>>>hit_string, "\n";
>>>>>
>>>>>
>>>> }
>>>> }
>>>>}
>>>>
>>>>
>>>>
>>>>
>>>I tested some of the BLAST results that Hubert sent Roger
>>>
>>>
>>and me with a
>>
>>
>>>similar script to the above. I removed the file parsing logic and it
>>>
>>>
>>seemed
>>
>>
>>>to work just fine. It may very well be a logic issue or
>>>
>>>
>>that he hasn't
>>
>>
>>>installed the latest fix.
>>>
>>>It's a funny thing, though. When I tried using blastcl3 (v.
>>>
>>>
>>2.2.13),
>>
>>
>>>even though the returned output was from nr, the top of the blast
>>>output showed that it was v2.2.12:
>>>
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>I double-checked my local version and it's definitely v.2.2.13:
>>>-------------------------------------
>>>C:\Perl\Scripts>blastcl3 -
>>>
>>>blastcl3 2.2.13 arguments:...
>>>-------------------------------------
>>>
>>>If you use RemoteBlast using the same settings, the version in the
>>>header looks like this:
>>>
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>
>>>I'm wondering if all the blast executables (blast and netblast) from
>>>NCBI have text output like v.2.2.12, while the wwwblast
>>>
>>>
>>outputs a new
>>
>>
>>>format (2.2.13). I'll ask blast-help at NCBI about this.
>>>
>>>
>>>
>>>
>>>
>>>>To clarify some stuff -
>>>>Chris I don't necessarily think the XML is best way forward
>>>>
>>>>
>>for BLAST
>>
>>
>>>>reports generated locally, it isn't as detailed as the Text
>>>>
>>>>
>>format and
>>
>>
>>>>it is what most people expect to be able to scroll through
>>>>
>>>>
>>and parse
>>
>>
>>>>-- it is also harder for the format to change dramatically
>>>>
>>>>
>>if you have
>>
>>
>>>>a static binary on your machine =). I think for
>>>>
>>>>
>>remoteblast the XML
>>
>>
>>>>format should be the way forward but I expect Bioperl to maintain
>>>>support of any plain text BLAST report format that people use on a
>>>>regular basis.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Does XML lack some specific info that text output has?
>>>
>>>
>>Didn't know that.
>>I
>>
>>
>>>believe that XML should be default in RemoteBlast since it will not
>>>break, but I agree with you about text output. I also agree that it
>>>will need somebody to maintain it constantly, much like RemoteBlast.
>>>
>>>
>>>
>>>
>>>
>>>>-jason
>>>>
>>>>
>>>>
>>>>
>>>>>Chris Fields wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>My guess is you're running into text parsing problems in
>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version
>>>>>>(1.5.1) or
>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>>I think the first problem you ran into is solved in
>>>>>>
>>>>>>
>>bioperl 1.5.1,
>>
>>
>>>>>>the last problem (more recent, not related to the first) has been
>>>>>>fixed but hasn't been committed to bioperl-live yet. The fixed
>>>>>>SearchIO::blast is available in the link above, but
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>realize it hasn't
>>>>
>>>>
>>>>
>>>>
>>>>>>been committed yet and may change.
>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>
>>>>>>>
>>Of Hubert
>>
>>
>>>>>>>Prielinger
>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>parsing Blast
>>>>
>>>>
>>>>
>>>>
>>>>>>>output
>>>>>>>
>>>>>>>Hi,
>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>
>>>>
>>>>
>>>>>>>is that a bug......
>>>>>>>
>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>>>anything.....
>>>>>>>I'm using bioperl 1.4
>>>>>>>
>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>parsing Blast
>>>>
>>>>
>>>>
>>>>
>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>bioperl version
>>>>
>>>>
>>>>
>>>>
>>>>>>>I had installed
>>>>>>>
>>>>>>>thanks in advance
>>>>>>>
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>--
>>>>Jason Stajich
>>>>Duke University
>>>>http://www.duke.edu/~jes12
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
>
>
>
More information about the Bioperl-l
mailing list