[Bioperl-l] GenericHit->start/end needs tiled hsps?

Sendu Bala bix at sendu.me.uk
Fri Apr 13 08:30:50 UTC 2007


Hi all,

I want to double-check my thinking regarding 
Bio::Search::Hit::GenericHit->start() and end(). Right now the docs 
claim that hsps of the hit object must be tiled before the answer can be 
produced. The code is implemented in that way 
(Bio::Search::SearchUtils::tile_hsps($self)).

Yet as far as I can see, all you need to do is loop through all hsps and 
pick out the smallest start and largest end respectively in terms of 
subject and query.

This comes up because I have a blast report where a single hit contains 
over 80000 hsps and the tiling takes over an hour (I gave up on it, 
don't know how long it really takes). The simple loop through hsps takes 
seconds or less.

Now in this situation the answer isn't especially useful (to me). An 
alternative way of fixing the problem would be to re-write the tiling 
algorithm (again) to somehow make it hundreds of times faster, then 
provide some way in start() and end() for the user to request the start 
and end of the best contig, or other contig of choice. Easier said than 
done though!


What do people think?



More information about the Bioperl-l mailing list