[Bioperl-l] Problems parsing blast reports
Lorenzo Carretero
lcpaulet at googlemail.com
Sat May 14 15:29:12 UTC 2011
Hi all,
I'm trying to parse blasttable '-m 8 format' reports from whole intra-genome
comparisons of all vs all to get the best non-self hit and diverged best
non-self hit as separate hashes (key=query=>value=non_self_hit). The
following script seems to run ok but it returns unexpected results (i.e. it
doesn't catch the best hit but a random hit apparently). I assume hits are
iterated over the while loop (while (my $hit = $result->next_hit) as
returned in the blast results (i.e. sorted by hits bit scores).
Any help would be much appreciated.
Cheers,
Lorenzo
my ( $filename, $format, $minimumbitsscore, $thresholdevalue,
> $minimumidentity, $maximumredundancy, $minimumalnlength ) = @_;
> my ( %besthits, %diverged, %redundant ,%genefusion ) = ();
> my ( $refbh, $refdiv, $cb, $cd, $refred, $reffus, $cr, $cf);
> my $total = 0;
> my $in = new Bio::SearchIO ( -file => $filename,
> -format => $format,
> #-verbose => -1,
> )
> or die "No $filename BLAST file with
> $format found";
> while( my $result = $in->next_result) {
> $total++;
> while (my $hit = $result->next_hit) {
> my $query = $result->query_name();
> my $hitname = $hit->name();
> my $bits = $hit->bits();
> my $evalue = $hit->significance();
> if ($query ne $hitname and $bits >= $minimumbitsscore and
> $evalue <= $thresholdevalue ) {
> $besthits{ $query } = $hitname;
> }
> elsif ( $bits <= $minimumbitsscore and $evalue >=
> $thresholdevalue){
> $diverged { $query } = $hitname;
> }
>
> # while( my $hsp = $hit->next_hsp ) {
> # my $querylen = $hsp->length( 'query' );
> # my $hitlen = $hsp->length( 'hit' );
> # my $alnlen = $hsp->length( 'total' );
> # my $identity = $hsp->percent_identity();
> # if ( $identity > $maximumredundancy ) {
> # $redundant{ $query } = $hitname;
> # }
> # elsif ( ($querylen <= ($alnlen / 2) ) || ($hitlen <=
> ($alnlen / 2) ) || ($alnlen <= $minimumalnlength) ) {
> # $genefusion{ $query } = $hitname;
> # }
> # elsif ( ($evalue >= $thresholdevalue) || ($identity <=
> $minimumidentity) || ($bits <= $minimumbitsscore) ) {
> # $diverged{ $query } = $hitname;
> # }
> # else {
> # $besthits{ $query } = $hitname;
> # }
> # }
> }
> }
> $refbh = \%besthits;
> $refdiv = \%diverged;
> $refred = \%redundant;
> $reffus = \%genefusion;
> $cb = keys ( %besthits );
> $cd = keys( %diverged );
> $cr = keys( %redundant );
> $cf = keys( %genefusion );
>
More information about the Bioperl-l
mailing list