[Bioperl-l] GenericHit->start/end needs tiled hsps?
Sendu Bala
bix at sendu.me.uk
Fri Apr 13 08:30:50 UTC 2007
Hi all,
I want to double-check my thinking regarding
Bio::Search::Hit::GenericHit->start() and end(). Right now the docs
claim that hsps of the hit object must be tiled before the answer can be
produced. The code is implemented in that way
(Bio::Search::SearchUtils::tile_hsps($self)).
Yet as far as I can see, all you need to do is loop through all hsps and
pick out the smallest start and largest end respectively in terms of
subject and query.
This comes up because I have a blast report where a single hit contains
over 80000 hsps and the tiling takes over an hour (I gave up on it,
don't know how long it really takes). The simple loop through hsps takes
seconds or less.
Now in this situation the answer isn't especially useful (to me). An
alternative way of fixing the problem would be to re-write the tiling
algorithm (again) to somehow make it hundreds of times faster, then
provide some way in start() and end() for the user to request the start
and end of the best contig, or other contig of choice. Easier said than
done though!
What do people think?
More information about the Bioperl-l
mailing list