[Bioperl-l] BLAST parsing broken

Chris Fields cjfields at illinois.edu
Mon May 3 12:08:01 UTC 2010


Odd, I ran tests on that prior to commit.  I'll work on fixing that (in svn of course, until the migration is complete).

chris

On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote:

> Chris,
> 
> latest additions to Bio::SearchIO::blast.pm broke the parsing of normal
> blast output.  $result->query_name returns now undef.
> 
> (Using the anonymous git now). This change still works:
> 
> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> Date:   Sun Dec 20 04:39:58 2009 +0000
> 
>    Robson's patch for buggy blastpgp output
> 
> But this does not:
> 
> commit 9a89c3434597104dd50553e3562983d78d14a544
> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> Date:   Thu Apr 15 04:21:17 2010 +0000
> 
>    [bug 3031]
> 
>    patches for catching algorithm ref, courtesy Razi Khaja.
> 
> That makes it easy to find the diffs:
> 
> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> 9a89c3434597104dd50553e3562983d78d14a544   Bio/SearchIO/blast.pm
> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
> index 378023a..6f7eeeb 100644
> --- a/Bio/SearchIO/blast.pm
> +++ b/Bio/SearchIO/blast.pm
> @@ -209,6 +209,7 @@ BEGIN {
> 
>         'BlastOutput_program'             => 'RESULT-algorithm_name',
>         'BlastOutput_version'             => 'RESULT-algorithm_version',
> +        'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference',
>         'BlastOutput_query-def'           => 'RESULT-query_name',
>         'BlastOutput_query-len'           => 'RESULT-query_length',
>         'BlastOutput_query-acc'           => 'RESULT-query_accession',
> @@ -504,6 +505,26 @@ sub next_result {
>                 }
>             );
>         }
> +        # parse the BLAST algorithm reference
> +        elsif(/^Reference:\s+(.*)$/) {
> +            # want to preserve newlines for the BLAST algorithm reference
> +            my $algorithm_reference = "$1\n";
> +            $_ = $self->_readline;
> +            # while the current line, does not match an empty line, a RID:,
> or a Database:, we are still looking at the
> +            # algorithm_reference, append it to what we parsed so far
> +            while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) {
> +                $algorithm_reference .= "$_";
> +                $_ = $self->_readline;
> +            }
> +            # if we exited the while loop, we saw an empty line, a RID:, or
> a Database:, so push it back
> +            $self->_pushback($_);
> +            $self->element(
> +                {
> +                    'Name' => 'BlastOutput_algorithm-reference',
> +                    'Data' => $algorithm_reference
> +                }
> +            );
> +        }
>         # added Windows workaround for bug 1985
>         elsif (/^(Searching|Results from round)/) {
>             next unless $1 =~ /Results from round/;
> 
> 
> I am not sure why reference parsing messes things up. Maybe it eats too many
> lines from the result file.
> 
> Yours,
> 
>    -Heikki
> 
> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> cell: +966 545 595 849  office: +966 2 808 2429
> 
> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
> 4700 King Abdullah University of Science and Technology (KAUST)
> Thuwal 23955-6900, Kingdom of Saudi Arabia
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list