[Bioperl-l] Fwd: BLAST parsing broken

Chris Fields cjfields at illinois.edu
Sun May 9 21:30:52 UTC 2010


Then something is wrong, as current trunk is at r16969.  Where are you pulling your code from?  Our only working anon. server is the sync'ed github one.

chris

On May 9, 2010, at 4:15 PM, Razi Khaja wrote:

> Hi Chris,
> The patch is against the main trunk.  I checked out version 11326 of the
> repository today.
> Razi
> 
> 
> On Sun, May 9, 2010 at 4:43 PM, Chris Fields <cjfields at illinois.edu> wrote:
> 
>> If the patch is against main trunk it isn't a problem, otherwise the diff
>> should be vs. that code.
>> 
>> chris
>> 
>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote:
>> 
>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem.
>>> Can someone advise an appropriate way to have this patch applied, given
>> that
>>> it is an amendment to a previous patch?
>>> Thanks
>>> Razi
>>> 
>>> 
>>> ---------- Forwarded message ----------
>>> From: Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>
>>> Date: Wed, May 5, 2010 at 2:11 AM
>>> Subject: Re: [Bioperl-l] BLAST parsing broken
>>> To: Razi Khaja <razi.khaja at gmail.com>
>>> 
>>> 
>>> Hi Raja,
>>> 
>>> Thanks for trying to fix this.
>>> 
>>> I am attaching an example output file to this message. I just tested
>> again
>>> that master from git repository fails to get query ID, but the previous
>>> version works.
>>> 
>>> bala ~/src/bioperl-live> git checkout master
>>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp
>>> output
>>> Switched to branch 'master'
>>> 
>>> When I started using the latest mpiBLAST code a few months ago I did
>> compare
>>> the 0 output from it to standard NCBI blast and they were identical.
>>> 
>>> 
>>> 
>>> 
>>> Also, I've noticed a discrepancy between within  bioperl blast parsing
>> that
>>> I have not had time to work on. Would you be interested in having a look?
>>> 
>>> I am creating output from mpiBLAST in 0 format and then converting it
>> into
>>> tab-delimited 8 format. I am  unable to get 100% similarity for all cases
>>> when I compare the conversion to the output straight from mpiBLAST in
>> format
>>> 8. Sometimes the  mismatch and gap values are off by one.
>>> 
>>> I am attaching a script that does the conversion. It is the same one I
>> was
>>> using when I noticed the problem above. I was going to put the code into
>>> bioperl but that got delayed when I noticed the discrepancies.
>>> 
>>> 
>>> Cheers,
>>> 
>>> 
>>>  -Heikki
>>> 
>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>>> cell: +966 545 595 849  office: +966 2 808 2429
>>> 
>>> Computational Bioscience Research Centre (CBRC), Building #2, Office
>> #4216
>>> 4700 King Abdullah University of Science and Technology (KAUST)
>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
>>> 
>>> 
>>> 
>>> On 4 May 2010 20:55, Razi Khaja <razi.khaja at gmail.com> wrote:
>>> 
>>>> That is odd.  Heikki, do you have a blast output file that produces this
>>>> error?
>>>> Could you attach the file and either send to the list or myself (if the
>>>> list
>>>> does not accept attachments).
>>>> Thanks,
>>>> Razi
>>>> 
>>>> 
>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields <cjfields at illinois.edu>
>>>> wrote:
>>>> 
>>>>> Odd, I ran tests on that prior to commit.  I'll work on fixing that (in
>>>> svn
>>>>> of course, until the migration is complete).
>>>>> 
>>>>> chris
>>>>> 
>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote:
>>>>> 
>>>>>> Chris,
>>>>>> 
>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of
>>>> normal
>>>>>> blast output.  $result->query_name returns now undef.
>>>>>> 
>>>>>> (Using the anonymous git now). This change still works:
>>>>>> 
>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
>>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
>>>>>> Date:   Sun Dec 20 04:39:58 2009 +0000
>>>>>> 
>>>>>> Robson's patch for buggy blastpgp output
>>>>>> 
>>>>>> But this does not:
>>>>>> 
>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544
>>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
>>>>>> Date:   Thu Apr 15 04:21:17 2010 +0000
>>>>>> 
>>>>>> [bug 3031]
>>>>>> 
>>>>>> patches for catching algorithm ref, courtesy Razi Khaja.
>>>>>> 
>>>>>> That makes it easy to find the diffs:
>>>>>> 
>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
>>>>>> 9a89c3434597104dd50553e3562983d78d14a544   Bio/SearchIO/blast.pm
>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
>>>>>> index 378023a..6f7eeeb 100644
>>>>>> --- a/Bio/SearchIO/blast.pm
>>>>>> +++ b/Bio/SearchIO/blast.pm
>>>>>> @@ -209,6 +209,7 @@ BEGIN {
>>>>>> 
>>>>>>      'BlastOutput_program'             => 'RESULT-algorithm_name',
>>>>>>      'BlastOutput_version'             =>
>>>> 'RESULT-algorithm_version',
>>>>>> +        'BlastOutput_algorithm-reference' =>
>>>>> 'RESULT-algorithm_reference',
>>>>>>      'BlastOutput_query-def'           => 'RESULT-query_name',
>>>>>>      'BlastOutput_query-len'           => 'RESULT-query_length',
>>>>>>      'BlastOutput_query-acc'           => 'RESULT-query_accession',
>>>>>> @@ -504,6 +505,26 @@ sub next_result {
>>>>>>              }
>>>>>>          );
>>>>>>      }
>>>>>> +        # parse the BLAST algorithm reference
>>>>>> +        elsif(/^Reference:\s+(.*)$/) {
>>>>>> +            # want to preserve newlines for the BLAST algorithm
>>>>> reference
>>>>>> +            my $algorithm_reference = "$1\n";
>>>>>> +            $_ = $self->_readline;
>>>>>> +            # while the current line, does not match an empty line, a
>>>>> RID:,
>>>>>> or a Database:, we are still looking at the
>>>>>> +            # algorithm_reference, append it to what we parsed so far
>>>>>> +            while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/)
>> {
>>>>>> +                $algorithm_reference .= "$_";
>>>>>> +                $_ = $self->_readline;
>>>>>> +            }
>>>>>> +            # if we exited the while loop, we saw an empty line, a
>>>> RID:,
>>>>> or
>>>>>> a Database:, so push it back
>>>>>> +            $self->_pushback($_);
>>>>>> +            $self->element(
>>>>>> +                {
>>>>>> +                    'Name' => 'BlastOutput_algorithm-reference',
>>>>>> +                    'Data' => $algorithm_reference
>>>>>> +                }
>>>>>> +            );
>>>>>> +        }
>>>>>>      # added Windows workaround for bug 1985
>>>>>>      elsif (/^(Searching|Results from round)/) {
>>>>>>          next unless $1 =~ /Results from round/;
>>>>>> 
>>>>>> 
>>>>>> I am not sure why reference parsing messes things up. Maybe it eats
>> too
>>>>> many
>>>>>> lines from the result file.
>>>>>> 
>>>>>> Yours,
>>>>>> 
>>>>>> -Heikki
>>>>>> 
>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>>>>>> cell: +966 545 595 849  office: +966 2 808 2429
>>>>>> 
>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office
>>>>> #4216
>>>>>> 4700 King Abdullah University of Science and Technology (KAUST)
>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>> <mpiblast.out><blastparser028.pl
>>> <blast.pm.diff>_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list