[Bioperl-l] Total BLAST query/hit intervals present between all hsps

Brian Desany bdesany@bcm.tmc.edu
Tue, 16 Jul 2002 08:29:27 -0500


I was thinking about the post a couple of days ago about finding how much of
the query is present between all the hsps in a hit (or between all hits) for
a blast report.

Reading the docs I found that GenericHSP's inherit from Bio::RangeI (~5
classes removed).
The Bio::RangeI docs show a method called union, which can be called like
this:

            my ($start,$stop,$strand) = Bio::RangeI->union(@ranges);

So I tested it out like this:

while( my $result = $rc->next_result ) {
        my @hsps;
        while (my $hit = $result->next_hit) {
            while(my $hsp = $hit->next_hsp()) {
                    push @hsps, $hsp;
            }
        }
        if (@hsps) {
            my ($start,$stop,$strand) = Bio::RangeI->union(@hsps);
            my $length = $stop-$start+1;
            print ($stop-$start+1) . "/" . $read->query_length . "\n";
        }
}

I got a Bio::Root::RootI deprecation warning, plus the method is trying to
return an object (in constrast to the docs).

To fix this in  my script, I did (after a few trials and errors):

            my $range = Bio::SeqFeature::Generic->union(@hsps);
            my $match = $range->end - $range->start + 1;
            print "$match / " . $read->query_length . "\n";

I think "union" as a convenience method in the $result object would be more
straightforward (preferably returning the 3 numbers and not an object). I
don't know if that would break any design rules that bioperl is trying to
follow.

Do you think this is possible/advisable?

Also, it just so happens that in this example I'm trying to find the total
*query* length present in the hsps, but conceptually the same thing could be
done for *hit* length (which may have been what the original poster
intended). However, the union method in RangeI only operates on "feature1"
of a SimilarityPair (another member of the GenericHSP inheritance
hierarchy). So I would also suggest that the "union" method either has a
'feature' parameter (taking 'query' or 'hit' as it's value) or that
complementary "query_union" and "hit_union" methods be added (analogous to
the hstart and hend methods in the SimilarityPair).

Thanks-
-Brian.
--------------------------------------------
Brian A. Desany, Ph.D.
Dictyostelium Genome Sequencing Project
Baylor College of Medicine
bdesany@bcm.tmc.edu
(713)798-5639
http://www.dictygenome.org
--------------------------------------------