[Bioperl-l] BLAST Parsing

Jason Stajich jason@cgt.mc.duke.edu
Sat, 21 Sep 2002 09:21:53 -0400 (EDT)


Paul - some changes have made their way in since 1.02 - would be better if
you could just send me the report that causes the problem.  The error
messages are a symptom that shouldn't be ignored or disabled, it means the
parser is hitting something in a BLAST report that wasn't expecting and
the alignment line parsing is out of sync.

-jason
On Fri, 20 Sep 2002, Paul Boutros wrote:

> Hi all,
>
> Another potential bug in BLAST parsing (SearchIO\blast.pm).
>
> My setup:
> BioPerl 1.02
> Perl 5.6.1 (ActiveState)
> WinXP SP1
>
> The parser doesn't seem to be recognizing one of the lines in my blast
> output file.  The error is:
>
> ------------- EXCEPTION  -------------
> MSG: no data for midline Lambda     K      H
> STACK Bio::SearchIO::blast::next_result
> C:/Perl/site/lib/Bio/SearchIO/blast.pm:5
> 67
> STACK toplevel parseb~1.pl:55
>
> --------------------------------------
>
> The offending part of the blast output file looks like this:
> =========================
> Sbjct: 564 cctggg 569
>
>
>
> Lambda     K      H
>     1.37    0.711     1.31
>
> Gapped
> Lambda     K      H
>     1.37    0.711     1.31
>
>
> Matrix: blastn matrix:1 -3
> ==========================
>
> BLAST parameters were:
> -p blastn
> -d est_others
> -e 0.001
> -v 10
> -b 10
> -l Rn_GI
>
> Minimal code is:
> use Bio::SearchIO;
> my $infile = $ARGV[0];
>
> my $searchio = new Bio::SearchIO(
> 			'-format'	=> 'blast',
> 			'-file'		=> $infile,
> 			);
>
> while (my $result = $searchio->next_result()) { }
>
> The offending part of the blast.pm file looks like this:
> if( /^((Query|Sbjct):\s+(\d+)\s*)(\S+)\s+(\d+)/ ) {
>      $data{$2} = $4;
>      $len = length($1);
>      $self->{"\_$2"}->{'begin'} = $3 unless $self->{"_$2"}->{'be
>      $self->{"\_$2"}->{'end'} = $5;
> } else {
>      $self->throw("no data for midline $_")
>        unless (defined $_ && defined $len);
>      $data{'Mid'} = substr($_,$len);
> }
>
> removing the $self->throw and replacing the unless with:
> if (defined $_ && defined $len) {
>   $data{'Mid'} = substr($_,$len);
>   }
>
> seems to be parsing correctly, but at the cost of an awful lot warnings.
>
> I can preparse out the
> Lambda	    K       H
> lines, but I'm not sure which one should be removed, or if I will also
> need to remove the blank lines.
>
> Any ideas/comments/criticism welcome.
> Paul
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu