[Bioperl-l] Parsing a netblast file

Wes Barris wes.barris at csiro.au
Thu Jul 31 01:27:43 EDT 2003


Hi,

This sample blast parser works on stand-alone blast result files but
fails on netblast result files.

#!/usr/local/bin/perl -w
use strict;
use Bio::SearchIO;

if ($#ARGV != 0) {
    print("Usage: parseblast.pl <blastfile>\n");
    exit;
    }
my $in = new Bio::SearchIO(-format => 'blast', -file   => $ARGV[0]);
while( my $result = $in->next_result ) {
    while( my $hit = $result->next_hit ) {
       while( my $hsp = $hit->next_hsp ) {
          if( $hsp->length('total') > 13 ) {
             if ( $hsp->percent_identity >= 45 ) {
                print "Hit= ",        $hit->name,
                      ",Length=",     $hsp->length('total'),
                      ",Expect=",     $hsp->evalue,
                      ",Percent_id=", $hsp->percent_identity, "\n";
                }
             }
          }
       }
    }

Through trial and error I have narrowed down the problem to the negative
sign in the database details.  Here is the section in question from a netblast
result file:

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
or phase 0, 1 or 2 HTGS sequences)
            1,819,241 sequences; -24,217,474 total letters

I don't know why, but all netblast result files I have looked at show a
negative value for the total number of letters.  If I remove the '-' sign,
the blast result file parses just fine with the above script.

Why does a netblast result file have a minus sign for the database size?
Why won't the parser work if there is a minus sign?
Is there a way to make the parser work despite the minus sign?
-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au




More information about the Bioperl-l mailing list