[Bioperl-l] parsing blast results

Charles Hauser chauser@duke.edu
11 Sep 2002 12:09:50 -0400


I would like to collect from a multiple blast report:
	- the top 3 (lowest eval) blast hits per query IF the eval is < cutoff
(1e -10),
	- their corresponding alignments
	- assorted blast params

Using the script below, I found that the HSPs are returned in reverse
order compared to how they appear in the blast report.  As a result I
save the 3 highest scoring HSPs (i.e. 1-3, below), not the lowest as
desired (i.e. 248-250, below).

Suggestions for gathering the best scoring HSPs?


Sample Data output:

1	6047.1  946	gi|18587823|ref|XP_083986.1|	623	1e-20   0.51	
2	6047.1  946	gi|71537|pir||KRMS2	581	1e-20   0.43	0.52	
3	6047.1  946	gi|4557705|ref|NP_000217.1|	622	1e-20   0.48	
...

248     6047.1  946	gi|1346181|sp|P49311|GRP2_SINAL 169	1e-39   0.55	
249     6047.1  946	gi|15239505|ref|NP_200911.1|	309 	1e-41   0.57	
250     6047.1  946	gi|544416|sp|Q05966|GR10_BRANA  169	4e-43   0.62	



use Bio::SearchIO;
my $in = new Bio::SearchIO (-format => 'psiblast',
			    -file   => $ARGV[0]);

#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );

QUERY: while( my $result = $in->next_result ) {
	   printf STDERR "\nReport %d: $result\n", $in->report_count;  
	   if( $result->hits ) {
	       my $c = 1;
	 HIT:  while( my $hit = $result->next_hit ) {
	          next QUERY if( ($hit->significance > 1e-10) || ($c==3) ) ;
		  while (my $hsp = $hit->next_hsp) {
		      my $aln = $hsp->get_aln();
		      print join("\t",
			   $c,
			   $result->query_name(),
			   $result->query_length(),
			   $hit->hit_name(),
			   $hit->hit_length(),
			   $hit->expect(),
			   $hit->frac_identical(),
			   "\n"
			   );
#		      print $out $aln;
		      $c++;
		      next HIT;
		  }
	      }
	   }
       }



Charles