[Bioperl-l] Fwd: BLAST parsing broken

Razi Khaja razi.khaja at gmail.com
Sun May 9 19:23:47 UTC 2010


Attached (blast.pm.diff) is a patch that fixes Heikki's problem.
Can someone advise an appropriate way to have this patch applied, given that
it is an amendment to a previous patch?
Thanks
Razi


---------- Forwarded message ----------
From: Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>
Date: Wed, May 5, 2010 at 2:11 AM
Subject: Re: [Bioperl-l] BLAST parsing broken
To: Razi Khaja <razi.khaja at gmail.com>


Hi Raja,

Thanks for trying to fix this.

I am attaching an example output file to this message. I just tested again
that master from git repository fails to get query ID, but the previous
version works.

  bala ~/src/bioperl-live> git checkout master
  Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp
output
  Switched to branch 'master'

When I started using the latest mpiBLAST code a few months ago I did compare
the 0 output from it to standard NCBI blast and they were identical.




Also, I've noticed a discrepancy between within  bioperl blast parsing that
I have not had time to work on. Would you be interested in having a look?

I am creating output from mpiBLAST in 0 format and then converting it into
tab-delimited 8 format. I am  unable to get 100% similarity for all cases
when I compare the conversion to the output straight from mpiBLAST in format
8. Sometimes the  mismatch and gap values are off by one.

I am attaching a script that does the conversion. It is the same one I was
using when I noticed the problem above. I was going to put the code into
bioperl but that got delayed when I noticed the discrepancies.


Cheers,


    -Heikki

Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +966 545 595 849  office: +966 2 808 2429

Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
4700 King Abdullah University of Science and Technology (KAUST)
Thuwal 23955-6900, Kingdom of Saudi Arabia



On 4 May 2010 20:55, Razi Khaja <razi.khaja at gmail.com> wrote:

> That is odd.  Heikki, do you have a blast output file that produces this
> error?
> Could you attach the file and either send to the list or myself (if the
> list
> does not accept attachments).
> Thanks,
> Razi
>
>
> On Mon, May 3, 2010 at 8:08 AM, Chris Fields <cjfields at illinois.edu>
> wrote:
>
> > Odd, I ran tests on that prior to commit.  I'll work on fixing that (in
> svn
> > of course, until the migration is complete).
> >
> > chris
> >
> > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote:
> >
> > > Chris,
> > >
> > > latest additions to Bio::SearchIO::blast.pm broke the parsing of
> normal
> > > blast output.  $result->query_name returns now undef.
> > >
> > > (Using the anonymous git now). This change still works:
> > >
> > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> > > Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> > > Date:   Sun Dec 20 04:39:58 2009 +0000
> > >
> > >    Robson's patch for buggy blastpgp output
> > >
> > > But this does not:
> > >
> > > commit 9a89c3434597104dd50553e3562983d78d14a544
> > > Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> > > Date:   Thu Apr 15 04:21:17 2010 +0000
> > >
> > >    [bug 3031]
> > >
> > >    patches for catching algorithm ref, courtesy Razi Khaja.
> > >
> > > That makes it easy to find the diffs:
> > >
> > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> > > 9a89c3434597104dd50553e3562983d78d14a544   Bio/SearchIO/blast.pm
> > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
> > > index 378023a..6f7eeeb 100644
> > > --- a/Bio/SearchIO/blast.pm
> > > +++ b/Bio/SearchIO/blast.pm
> > > @@ -209,6 +209,7 @@ BEGIN {
> > >
> > >         'BlastOutput_program'             => 'RESULT-algorithm_name',
> > >         'BlastOutput_version'             =>
> 'RESULT-algorithm_version',
> > > +        'BlastOutput_algorithm-reference' =>
> > 'RESULT-algorithm_reference',
> > >         'BlastOutput_query-def'           => 'RESULT-query_name',
> > >         'BlastOutput_query-len'           => 'RESULT-query_length',
> > >         'BlastOutput_query-acc'           => 'RESULT-query_accession',
> > > @@ -504,6 +505,26 @@ sub next_result {
> > >                 }
> > >             );
> > >         }
> > > +        # parse the BLAST algorithm reference
> > > +        elsif(/^Reference:\s+(.*)$/) {
> > > +            # want to preserve newlines for the BLAST algorithm
> > reference
> > > +            my $algorithm_reference = "$1\n";
> > > +            $_ = $self->_readline;
> > > +            # while the current line, does not match an empty line, a
> > RID:,
> > > or a Database:, we are still looking at the
> > > +            # algorithm_reference, append it to what we parsed so far
> > > +            while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) {
> > > +                $algorithm_reference .= "$_";
> > > +                $_ = $self->_readline;
> > > +            }
> > > +            # if we exited the while loop, we saw an empty line, a
> RID:,
> > or
> > > a Database:, so push it back
> > > +            $self->_pushback($_);
> > > +            $self->element(
> > > +                {
> > > +                    'Name' => 'BlastOutput_algorithm-reference',
> > > +                    'Data' => $algorithm_reference
> > > +                }
> > > +            );
> > > +        }
> > >         # added Windows workaround for bug 1985
> > >         elsif (/^(Searching|Results from round)/) {
> > >             next unless $1 =~ /Results from round/;
> > >
> > >
> > > I am not sure why reference parsing messes things up. Maybe it eats too
> > many
> > > lines from the result file.
> > >
> > > Yours,
> > >
> > >    -Heikki
> > >
> > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> > > cell: +966 545 595 849  office: +966 2 808 2429
> > >
> > > Computational Bioscience Research Centre (CBRC), Building #2, Office
> > #4216
> > > 4700 King Abdullah University of Science and Technology (KAUST)
> > > Thuwal 23955-6900, Kingdom of Saudi Arabia
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiblast.out
Type: application/octet-stream
Size: 34844 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100509/56eaa3a1/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastparser028.pl
Type: application/x-perl
Size: 2024 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100509/56eaa3a1/attachment.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blast.pm.diff
Type: text/x-patch
Size: 994 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100509/56eaa3a1/attachment-0008.bin>


More information about the Bioperl-l mailing list