[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output
Phillip SanMiguel
pmiguel at purdue.edu
Sun Feb 12 20:05:47 UTC 2006
Roger,
Just a data point, but in case you were not already aware of it, the
characters W, K and R may be included in some DNA sequences. 'W' means
'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember
correctly. These are ambiguous bases, where a basecaller isn't sure, for
example, whether a particular peak is an A or a T. Although I see these
ambiguous bases less frequently these days, even common modern
basecallers (such as Applied Biosystems basecallers) can generally be
configured so they will generate them. Downstream applications may not
like them, however.
I may be just stating the obvious, or this might be irrelevant to
the issue at hand. If so, my apologies.
Phillip
Roger Hall wrote:
> Guys - I'm looking at the error message:
>
> MSG: no data for midline Query 1 WWWKWRW 7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
> This is my line of thought:
> 1. "no data for midline $_" is a unique message generated by blast.pm in one
> location only at the point of a. reading three lines b. dropping lines with
> spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
> 2. There is a regexp match that fails in order to reach that error message
> 3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression
> 4. It does anyway
> 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast
> reports
>
> I suspect a newline/chomp/metacharacter issue. Not finding the string
> anywhere has me thoroughly confused - I asked Hubert for the additional
> file, assuming that I didn't have it.
>
> My next thought is to write a quick script to test perl behavior on "Fedora
> Core 9".
>
> Thoughts?
>
> Did I misread the issue entirely? :}
>
> Roger
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 09, 2006 10:16 AM
> To: 'Jason Stajich'; 'Hubert Prielinger'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> output
>
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Thursday, February 09, 2006 9:13 AM
>> To: Hubert Prielinger
>> Cc: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>> parsing Blast output
>>
>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>
>>> hi chris,
>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>
>> working,
>>
>>> do you have any ohter idea, the problem I have is that I
>>>
>> have to parse
>>
>>> a lot of textfiles....
>>> or shall I look for another option to parse those files...
>>>
>>> regards
>>> Hubert
>>>
>> The code from Bioperl 1.5.1 works fine for me for blast
>> 2.2.13 reports but unless you post your blast report we can't
>> really determine the problem.
>>
>> If you are still getting the same error like this I am not
>> convinced you have upgraded to 1.5.1 which includes a fix in
>> the fact that NCBI changed the HSP result format to remove
>> the ':' from the Query/Sbjct prefixes. We fixed this as soon
>> as it was apparent sometime in September.
>>
>>
>>>>> MSG: no data for midline Query 1 WWWKWRW 7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> If you are just getting no results but also no warnings wrt
>> parsing, are you sure your logic is correct?
>>
>> If you remove your filters do you see all the HSPS?
>>
>>
>> while (my $result = $search->next_result) {
>> print $result->query_name, "\n";
>> #iterate over each hit on the query sequence
>> while (my $hit = $result->next_hit) {
>> print $hit->name, "\n";
>> #iterate over each HSP in the hit
>> while (my $hsp = $hit->next_hsp) {
>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>> >hit_string, "\n";
>> }
>> }
>> }
>>
>
> I tested some of the BLAST results that Hubert sent Roger and me with a
> similar script to the above. I removed the file parsing logic and it seemed
> to work just fine. It may very well be a logic issue or that he hasn't
> installed the latest fix.
>
> It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even
> though the returned output was from nr, the top of the blast output showed
> that it was v2.2.12:
>
> BLASTP 2.2.12 [Aug-07-2005]
>
> I double-checked my local version and it's definitely v.2.2.13:
> -------------------------------------
> C:\Perl\Scripts>blastcl3 -
>
> blastcl3 2.2.13 arguments:...
> -------------------------------------
>
> If you use RemoteBlast using the same settings, the version in the header
> looks like this:
>
> BLASTP 2.2.13 [Nov-27-2005]
>
> I'm wondering if all the blast executables (blast and netblast) from NCBI
> have text output like v.2.2.12, while the wwwblast outputs a new format
> (2.2.13). I'll ask blast-help at NCBI about this.
>
>
>> To clarify some stuff -
>> Chris I don't necessarily think the XML is best way forward
>> for BLAST reports generated locally, it isn't as detailed as
>> the Text format and it is what most people expect to be able
>> to scroll through and parse -- it is also harder for the
>> format to change dramatically if you have a static binary on
>> your machine =). I think for remoteblast the XML format
>> should be the way forward but I expect Bioperl to maintain
>> support of any plain text BLAST report format that people use
>> on a regular basis.
>>
>>
>
> Does XML lack some specific info that text output has? Didn't know that. I
> believe that XML should be default in RemoteBlast since it will not break,
> but I agree with you about text output. I also agree that it will need
> somebody to maintain it constantly, much like RemoteBlast.
>
>
>> -jason
>>
>>> Chris Fields wrote:
>>>
>>>
>>>> My guess is you're running into text parsing problems in
>>>> Bio::SearchIO::blast. Upgrade to the latest developer version
>>>> (1.5.1) or
>>>> bioperl-live (CVS), then see the bug below.
>>>>
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>> I think the first problem you ran into is solved in bioperl 1.5.1,
>>>> the last problem (more recent, not related to the first) has been
>>>> fixed but hasn't been committed to bioperl-live yet. The fixed
>>>> SearchIO::blast is available in the link above, but
>>>>
>> realize it hasn't
>>
>>>> been committed yet and may change.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>> Prielinger
>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>> To: bioperl-l at bioperl.org
>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>
>> parsing Blast
>>
>>>>> output
>>>>>
>>>>> Hi,
>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>> Bio::SearchIO, I get the following error message:
>>>>>
>>>>> MSG: no data for midline Query 1 WWWKWRW 7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>>>>> is that a bug......
>>>>>
>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>> anything.....
>>>>> I'm using bioperl 1.4
>>>>>
>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>
>> parsing Blast
>>
>>>>> Output (version 2.2.12), but I don't remember which
>>>>>
>> bioperl version
>>
>>>>> I had installed
>>>>>
>>>>> thanks in advance
>>>>>
>>>>> Hubert
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list