[Bioperl-l] Total BLAST query/hit intervals present between all hsps
Brian Desany
bdesany@bcm.tmc.edu
Tue, 16 Jul 2002 08:29:27 -0500
I was thinking about the post a couple of days ago about finding how much of
the query is present between all the hsps in a hit (or between all hits) for
a blast report.
Reading the docs I found that GenericHSP's inherit from Bio::RangeI (~5
classes removed).
The Bio::RangeI docs show a method called union, which can be called like
this:
my ($start,$stop,$strand) = Bio::RangeI->union(@ranges);
So I tested it out like this:
while( my $result = $rc->next_result ) {
my @hsps;
while (my $hit = $result->next_hit) {
while(my $hsp = $hit->next_hsp()) {
push @hsps, $hsp;
}
}
if (@hsps) {
my ($start,$stop,$strand) = Bio::RangeI->union(@hsps);
my $length = $stop-$start+1;
print ($stop-$start+1) . "/" . $read->query_length . "\n";
}
}
I got a Bio::Root::RootI deprecation warning, plus the method is trying to
return an object (in constrast to the docs).
To fix this in my script, I did (after a few trials and errors):
my $range = Bio::SeqFeature::Generic->union(@hsps);
my $match = $range->end - $range->start + 1;
print "$match / " . $read->query_length . "\n";
I think "union" as a convenience method in the $result object would be more
straightforward (preferably returning the 3 numbers and not an object). I
don't know if that would break any design rules that bioperl is trying to
follow.
Do you think this is possible/advisable?
Also, it just so happens that in this example I'm trying to find the total
*query* length present in the hsps, but conceptually the same thing could be
done for *hit* length (which may have been what the original poster
intended). However, the union method in RangeI only operates on "feature1"
of a SimilarityPair (another member of the GenericHSP inheritance
hierarchy). So I would also suggest that the "union" method either has a
'feature' parameter (taking 'query' or 'hit' as it's value) or that
complementary "query_union" and "hit_union" methods be added (analogous to
the hstart and hend methods in the SimilarityPair).
Thanks-
-Brian.
--------------------------------------------
Brian A. Desany, Ph.D.
Dictyostelium Genome Sequencing Project
Baylor College of Medicine
bdesany@bcm.tmc.edu
(713)798-5639
http://www.dictygenome.org
--------------------------------------------