[Bioperl-l] Fwd: BLAST parsing broken

Chris Fields cjfields at illinois.edu
Mon May 10 00:39:33 UTC 2010


Ok, that's fine.  It may be something off with revision numbers when using svn with github (git doesn't have incremental revisions, but a SHA).  Committed the patch to dev svn, in r16970.

chris

On May 9, 2010, at 6:48 PM, Razi Khaja wrote:

> I checked out bioperl-live from github:
> svn checkout http://svn.github.com/bioperl/bioperl-live.git
> 
> I just checked it out again, a few seconds ago and by default I got revision
> 11326.
> Razi
> 
> 
> On Sun, May 9, 2010 at 5:30 PM, Chris Fields <cjfields at illinois.edu> wrote:
> 
>> Then something is wrong, as current trunk is at r16969.  Where are you
>> pulling your code from?  Our only working anon. server is the sync'ed github
>> one.
>> 
>> chris
>> 
>> On May 9, 2010, at 4:15 PM, Razi Khaja wrote:
>> 
>>> Hi Chris,
>>> The patch is against the main trunk.  I checked out version 11326 of the
>>> repository today.
>>> Razi
>>> 
>>> 
>>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields <cjfields at illinois.edu>
>> wrote:
>>> 
>>>> If the patch is against main trunk it isn't a problem, otherwise the
>> diff
>>>> should be vs. that code.
>>>> 
>>>> chris
>>>> 
>>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote:
>>>> 
>>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem.
>>>>> Can someone advise an appropriate way to have this patch applied, given
>>>> that
>>>>> it is an amendment to a previous patch?
>>>>> Thanks
>>>>> Razi
>>>>> 
>>>>> 
>>>>> ---------- Forwarded message ----------
>>>>> From: Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>
>>>>> Date: Wed, May 5, 2010 at 2:11 AM
>>>>> Subject: Re: [Bioperl-l] BLAST parsing broken
>>>>> To: Razi Khaja <razi.khaja at gmail.com>
>>>>> 
>>>>> 
>>>>> Hi Raja,
>>>>> 
>>>>> Thanks for trying to fix this.
>>>>> 
>>>>> I am attaching an example output file to this message. I just tested
>>>> again
>>>>> that master from git repository fails to get query ID, but the previous
>>>>> version works.
>>>>> 
>>>>> bala ~/src/bioperl-live> git checkout master
>>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp
>>>>> output
>>>>> Switched to branch 'master'
>>>>> 
>>>>> When I started using the latest mpiBLAST code a few months ago I did
>>>> compare
>>>>> the 0 output from it to standard NCBI blast and they were identical.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Also, I've noticed a discrepancy between within  bioperl blast parsing
>>>> that
>>>>> I have not had time to work on. Would you be interested in having a
>> look?
>>>>> 
>>>>> I am creating output from mpiBLAST in 0 format and then converting it
>>>> into
>>>>> tab-delimited 8 format. I am  unable to get 100% similarity for all
>> cases
>>>>> when I compare the conversion to the output straight from mpiBLAST in
>>>> format
>>>>> 8. Sometimes the  mismatch and gap values are off by one.
>>>>> 
>>>>> I am attaching a script that does the conversion. It is the same one I
>>>> was
>>>>> using when I noticed the problem above. I was going to put the code
>> into
>>>>> bioperl but that got delayed when I noticed the discrepancies.
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> 
>>>>> -Heikki
>>>>> 
>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>>>>> cell: +966 545 595 849  office: +966 2 808 2429
>>>>> 
>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office
>>>> #4216
>>>>> 4700 King Abdullah University of Science and Technology (KAUST)
>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
>>>>> 
>>>>> 
>>>>> 
>>>>> On 4 May 2010 20:55, Razi Khaja <razi.khaja at gmail.com> wrote:
>>>>> 
>>>>>> That is odd.  Heikki, do you have a blast output file that produces
>> this
>>>>>> error?
>>>>>> Could you attach the file and either send to the list or myself (if
>> the
>>>>>> list
>>>>>> does not accept attachments).
>>>>>> Thanks,
>>>>>> Razi
>>>>>> 
>>>>>> 
>>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields <cjfields at illinois.edu>
>>>>>> wrote:
>>>>>> 
>>>>>>> Odd, I ran tests on that prior to commit.  I'll work on fixing that
>> (in
>>>>>> svn
>>>>>>> of course, until the migration is complete).
>>>>>>> 
>>>>>>> chris
>>>>>>> 
>>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote:
>>>>>>> 
>>>>>>>> Chris,
>>>>>>>> 
>>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of
>>>>>> normal
>>>>>>>> blast output.  $result->query_name returns now undef.
>>>>>>>> 
>>>>>>>> (Using the anonymous git now). This change still works:
>>>>>>>> 
>>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
>>>>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
>>>>>>>> Date:   Sun Dec 20 04:39:58 2009 +0000
>>>>>>>> 
>>>>>>>> Robson's patch for buggy blastpgp output
>>>>>>>> 
>>>>>>>> But this does not:
>>>>>>>> 
>>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544
>>>>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
>>>>>>>> Date:   Thu Apr 15 04:21:17 2010 +0000
>>>>>>>> 
>>>>>>>> [bug 3031]
>>>>>>>> 
>>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja.
>>>>>>>> 
>>>>>>>> That makes it easy to find the diffs:
>>>>>>>> 
>>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
>>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544   Bio/SearchIO/blast.pm
>>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
>>>>>>>> index 378023a..6f7eeeb 100644
>>>>>>>> --- a/Bio/SearchIO/blast.pm
>>>>>>>> +++ b/Bio/SearchIO/blast.pm
>>>>>>>> @@ -209,6 +209,7 @@ BEGIN {
>>>>>>>> 
>>>>>>>>    'BlastOutput_program'             => 'RESULT-algorithm_name',
>>>>>>>>    'BlastOutput_version'             =>
>>>>>> 'RESULT-algorithm_version',
>>>>>>>> +        'BlastOutput_algorithm-reference' =>
>>>>>>> 'RESULT-algorithm_reference',
>>>>>>>>    'BlastOutput_query-def'           => 'RESULT-query_name',
>>>>>>>>    'BlastOutput_query-len'           => 'RESULT-query_length',
>>>>>>>>    'BlastOutput_query-acc'           => 'RESULT-query_accession',
>>>>>>>> @@ -504,6 +505,26 @@ sub next_result {
>>>>>>>>            }
>>>>>>>>        );
>>>>>>>>    }
>>>>>>>> +        # parse the BLAST algorithm reference
>>>>>>>> +        elsif(/^Reference:\s+(.*)$/) {
>>>>>>>> +            # want to preserve newlines for the BLAST algorithm
>>>>>>> reference
>>>>>>>> +            my $algorithm_reference = "$1\n";
>>>>>>>> +            $_ = $self->_readline;
>>>>>>>> +            # while the current line, does not match an empty line,
>> a
>>>>>>> RID:,
>>>>>>>> or a Database:, we are still looking at the
>>>>>>>> +            # algorithm_reference, append it to what we parsed so
>> far
>>>>>>>> +            while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~
>> /^Database:/)
>>>> {
>>>>>>>> +                $algorithm_reference .= "$_";
>>>>>>>> +                $_ = $self->_readline;
>>>>>>>> +            }
>>>>>>>> +            # if we exited the while loop, we saw an empty line, a
>>>>>> RID:,
>>>>>>> or
>>>>>>>> a Database:, so push it back
>>>>>>>> +            $self->_pushback($_);
>>>>>>>> +            $self->element(
>>>>>>>> +                {
>>>>>>>> +                    'Name' => 'BlastOutput_algorithm-reference',
>>>>>>>> +                    'Data' => $algorithm_reference
>>>>>>>> +                }
>>>>>>>> +            );
>>>>>>>> +        }
>>>>>>>>    # added Windows workaround for bug 1985
>>>>>>>>    elsif (/^(Searching|Results from round)/) {
>>>>>>>>        next unless $1 =~ /Results from round/;
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I am not sure why reference parsing messes things up. Maybe it eats
>>>> too
>>>>>>> many
>>>>>>>> lines from the result file.
>>>>>>>> 
>>>>>>>> Yours,
>>>>>>>> 
>>>>>>>> -Heikki
>>>>>>>> 
>>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>>>>>>>> cell: +966 545 595 849  office: +966 2 808 2429
>>>>>>>> 
>>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office
>>>>>>> #4216
>>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST)
>>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>>> <mpiblast.out><blastparser028.pl
>>>>> <blast.pm.diff>_______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list