[Bioperl-l] parsing blast results
Charles Hauser
chauser@duke.edu
11 Sep 2002 12:09:50 -0400
I would like to collect from a multiple blast report:
- the top 3 (lowest eval) blast hits per query IF the eval is < cutoff
(1e -10),
- their corresponding alignments
- assorted blast params
Using the script below, I found that the HSPs are returned in reverse
order compared to how they appear in the blast report. As a result I
save the 3 highest scoring HSPs (i.e. 1-3, below), not the lowest as
desired (i.e. 248-250, below).
Suggestions for gathering the best scoring HSPs?
Sample Data output:
1 6047.1 946 gi|18587823|ref|XP_083986.1| 623 1e-20 0.51
2 6047.1 946 gi|71537|pir||KRMS2 581 1e-20 0.43 0.52
3 6047.1 946 gi|4557705|ref|NP_000217.1| 622 1e-20 0.48
...
248 6047.1 946 gi|1346181|sp|P49311|GRP2_SINAL 169 1e-39 0.55
249 6047.1 946 gi|15239505|ref|NP_200911.1| 309 1e-41 0.57
250 6047.1 946 gi|544416|sp|Q05966|GR10_BRANA 169 4e-43 0.62
use Bio::SearchIO;
my $in = new Bio::SearchIO (-format => 'psiblast',
-file => $ARGV[0]);
#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
QUERY: while( my $result = $in->next_result ) {
printf STDERR "\nReport %d: $result\n", $in->report_count;
if( $result->hits ) {
my $c = 1;
HIT: while( my $hit = $result->next_hit ) {
next QUERY if( ($hit->significance > 1e-10) || ($c==3) ) ;
while (my $hsp = $hit->next_hsp) {
my $aln = $hsp->get_aln();
print join("\t",
$c,
$result->query_name(),
$result->query_length(),
$hit->hit_name(),
$hit->hit_length(),
$hit->expect(),
$hit->frac_identical(),
"\n"
);
# print $out $aln;
$c++;
next HIT;
}
}
}
}
Charles