[Bioperl-l] Fwd: BLAST parsing broken

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Tue May 11 05:43:42 UTC 2010


Thanks Razi and Chris,

Blast parsing works again beautifully.

    -Heikki

Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +966 545 595 849  office: +966 2 808 2429

Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
4700 King Abdullah University of Science and Technology (KAUST)
Thuwal 23955-6900, Kingdom of Saudi Arabia



On 10 May 2010 03:39, Chris Fields <cjfields at illinois.edu> wrote:

> Ok, that's fine.  It may be something off with revision numbers when using
> svn with github (git doesn't have incremental revisions, but a SHA).
>  Committed the patch to dev svn, in r16970.
>
> chris
>
> On May 9, 2010, at 6:48 PM, Razi Khaja wrote:
>
> > I checked out bioperl-live from github:
> > svn checkout http://svn.github.com/bioperl/bioperl-live.git
> >
> > I just checked it out again, a few seconds ago and by default I got
> revision
> > 11326.
> > Razi
> >
> >
> > On Sun, May 9, 2010 at 5:30 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> >
> >> Then something is wrong, as current trunk is at r16969.  Where are you
> >> pulling your code from?  Our only working anon. server is the sync'ed
> github
> >> one.
> >>
> >> chris
> >>
> >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote:
> >>
> >>> Hi Chris,
> >>> The patch is against the main trunk.  I checked out version 11326 of
> the
> >>> repository today.
> >>> Razi
> >>>
> >>>
> >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields <cjfields at illinois.edu>
> >> wrote:
> >>>
> >>>> If the patch is against main trunk it isn't a problem, otherwise the
> >> diff
> >>>> should be vs. that code.
> >>>>
> >>>> chris
> >>>>
> >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote:
> >>>>
> >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem.
> >>>>> Can someone advise an appropriate way to have this patch applied,
> given
> >>>> that
> >>>>> it is an amendment to a previous patch?
> >>>>> Thanks
> >>>>> Razi
> >>>>>
> >>>>>
> >>>>> ---------- Forwarded message ----------
> >>>>> From: Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>
> >>>>> Date: Wed, May 5, 2010 at 2:11 AM
> >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken
> >>>>> To: Razi Khaja <razi.khaja at gmail.com>
> >>>>>
> >>>>>
> >>>>> Hi Raja,
> >>>>>
> >>>>> Thanks for trying to fix this.
> >>>>>
> >>>>> I am attaching an example output file to this message. I just tested
> >>>> again
> >>>>> that master from git repository fails to get query ID, but the
> previous
> >>>>> version works.
> >>>>>
> >>>>> bala ~/src/bioperl-live> git checkout master
> >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy
> blastpgp
> >>>>> output
> >>>>> Switched to branch 'master'
> >>>>>
> >>>>> When I started using the latest mpiBLAST code a few months ago I did
> >>>> compare
> >>>>> the 0 output from it to standard NCBI blast and they were identical.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Also, I've noticed a discrepancy between within  bioperl blast
> parsing
> >>>> that
> >>>>> I have not had time to work on. Would you be interested in having a
> >> look?
> >>>>>
> >>>>> I am creating output from mpiBLAST in 0 format and then converting it
> >>>> into
> >>>>> tab-delimited 8 format. I am  unable to get 100% similarity for all
> >> cases
> >>>>> when I compare the conversion to the output straight from mpiBLAST in
> >>>> format
> >>>>> 8. Sometimes the  mismatch and gap values are off by one.
> >>>>>
> >>>>> I am attaching a script that does the conversion. It is the same one
> I
> >>>> was
> >>>>> using when I noticed the problem above. I was going to put the code
> >> into
> >>>>> bioperl but that got delayed when I noticed the discrepancies.
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>>
> >>>>> -Heikki
> >>>>>
> >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> >>>>> cell: +966 545 595 849  office: +966 2 808 2429
> >>>>>
> >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office
> >>>> #4216
> >>>>> 4700 King Abdullah University of Science and Technology (KAUST)
> >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 4 May 2010 20:55, Razi Khaja <razi.khaja at gmail.com> wrote:
> >>>>>
> >>>>>> That is odd.  Heikki, do you have a blast output file that produces
> >> this
> >>>>>> error?
> >>>>>> Could you attach the file and either send to the list or myself (if
> >> the
> >>>>>> list
> >>>>>> does not accept attachments).
> >>>>>> Thanks,
> >>>>>> Razi
> >>>>>>
> >>>>>>
> >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields <cjfields at illinois.edu
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Odd, I ran tests on that prior to commit.  I'll work on fixing that
> >> (in
> >>>>>> svn
> >>>>>>> of course, until the migration is complete).
> >>>>>>>
> >>>>>>> chris
> >>>>>>>
> >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote:
> >>>>>>>
> >>>>>>>> Chris,
> >>>>>>>>
> >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of
> >>>>>> normal
> >>>>>>>> blast output.  $result->query_name returns now undef.
> >>>>>>>>
> >>>>>>>> (Using the anonymous git now). This change still works:
> >>>>>>>>
> >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> >>>>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> >>>>>>>> Date:   Sun Dec 20 04:39:58 2009 +0000
> >>>>>>>>
> >>>>>>>> Robson's patch for buggy blastpgp output
> >>>>>>>>
> >>>>>>>> But this does not:
> >>>>>>>>
> >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544
> >>>>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> >>>>>>>> Date:   Thu Apr 15 04:21:17 2010 +0000
> >>>>>>>>
> >>>>>>>> [bug 3031]
> >>>>>>>>
> >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja.
> >>>>>>>>
> >>>>>>>> That makes it easy to find the diffs:
> >>>>>>>>
> >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544   Bio/SearchIO/blast.pm
> >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
> >>>>>>>> index 378023a..6f7eeeb 100644
> >>>>>>>> --- a/Bio/SearchIO/blast.pm
> >>>>>>>> +++ b/Bio/SearchIO/blast.pm
> >>>>>>>> @@ -209,6 +209,7 @@ BEGIN {
> >>>>>>>>
> >>>>>>>>    'BlastOutput_program'             => 'RESULT-algorithm_name',
> >>>>>>>>    'BlastOutput_version'             =>
> >>>>>> 'RESULT-algorithm_version',
> >>>>>>>> +        'BlastOutput_algorithm-reference' =>
> >>>>>>> 'RESULT-algorithm_reference',
> >>>>>>>>    'BlastOutput_query-def'           => 'RESULT-query_name',
> >>>>>>>>    'BlastOutput_query-len'           => 'RESULT-query_length',
> >>>>>>>>    'BlastOutput_query-acc'           => 'RESULT-query_accession',
> >>>>>>>> @@ -504,6 +505,26 @@ sub next_result {
> >>>>>>>>            }
> >>>>>>>>        );
> >>>>>>>>    }
> >>>>>>>> +        # parse the BLAST algorithm reference
> >>>>>>>> +        elsif(/^Reference:\s+(.*)$/) {
> >>>>>>>> +            # want to preserve newlines for the BLAST algorithm
> >>>>>>> reference
> >>>>>>>> +            my $algorithm_reference = "$1\n";
> >>>>>>>> +            $_ = $self->_readline;
> >>>>>>>> +            # while the current line, does not match an empty
> line,
> >> a
> >>>>>>> RID:,
> >>>>>>>> or a Database:, we are still looking at the
> >>>>>>>> +            # algorithm_reference, append it to what we parsed so
> >> far
> >>>>>>>> +            while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~
> >> /^Database:/)
> >>>> {
> >>>>>>>> +                $algorithm_reference .= "$_";
> >>>>>>>> +                $_ = $self->_readline;
> >>>>>>>> +            }
> >>>>>>>> +            # if we exited the while loop, we saw an empty line,
> a
> >>>>>> RID:,
> >>>>>>> or
> >>>>>>>> a Database:, so push it back
> >>>>>>>> +            $self->_pushback($_);
> >>>>>>>> +            $self->element(
> >>>>>>>> +                {
> >>>>>>>> +                    'Name' => 'BlastOutput_algorithm-reference',
> >>>>>>>> +                    'Data' => $algorithm_reference
> >>>>>>>> +                }
> >>>>>>>> +            );
> >>>>>>>> +        }
> >>>>>>>>    # added Windows workaround for bug 1985
> >>>>>>>>    elsif (/^(Searching|Results from round)/) {
> >>>>>>>>        next unless $1 =~ /Results from round/;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I am not sure why reference parsing messes things up. Maybe it
> eats
> >>>> too
> >>>>>>> many
> >>>>>>>> lines from the result file.
> >>>>>>>>
> >>>>>>>> Yours,
> >>>>>>>>
> >>>>>>>> -Heikki
> >>>>>>>>
> >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> >>>>>>>> cell: +966 545 595 849  office: +966 2 808 2429
> >>>>>>>>
> >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2,
> Office
> >>>>>>> #4216
> >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST)
> >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>> <mpiblast.out><blastparser028.pl
> >>>>> <blast.pm.diff>_______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list