[Bioperl-l] Parsing a netblast file
Wes Barris
wes.barris at csiro.au
Thu Jul 31 01:27:43 EDT 2003
Hi,
This sample blast parser works on stand-alone blast result files but
fails on netblast result files.
#!/usr/local/bin/perl -w
use strict;
use Bio::SearchIO;
if ($#ARGV != 0) {
print("Usage: parseblast.pl <blastfile>\n");
exit;
}
my $in = new Bio::SearchIO(-format => 'blast', -file => $ARGV[0]);
while( my $result = $in->next_result ) {
while( my $hit = $result->next_hit ) {
while( my $hsp = $hit->next_hsp ) {
if( $hsp->length('total') > 13 ) {
if ( $hsp->percent_identity >= 45 ) {
print "Hit= ", $hit->name,
",Length=", $hsp->length('total'),
",Expect=", $hsp->evalue,
",Percent_id=", $hsp->percent_identity, "\n";
}
}
}
}
}
Through trial and error I have narrowed down the problem to the negative
sign in the database details. Here is the section in question from a netblast
result file:
Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
or phase 0, 1 or 2 HTGS sequences)
1,819,241 sequences; -24,217,474 total letters
I don't know why, but all netblast result files I have looked at show a
negative value for the total number of letters. If I remove the '-' sign,
the blast result file parses just fine with the above script.
Why does a netblast result file have a minus sign for the database size?
Why won't the parser work if there is a minus sign?
Is there a way to make the parser work despite the minus sign?
--
Wes Barris
E-Mail: Wes.Barris at csiro.au
More information about the Bioperl-l
mailing list