[Bioperl-l] SearchIO speed up
Chris Fields
cjfields at uiuc.edu
Fri Aug 18 02:12:52 UTC 2006
On Aug 17, 2006, at 4:53 PM, Sendu Bala wrote:
> Chris Fields wrote:
>> ...
>
> That is exactly what I did (on your suggestion). The problem that
> Hilmar
> points out is that HSPI should continue being a SimilarityPair in case
> anything checks that it is a SimilarityPair.
Okay, fine by me. It was merely a suggestion thrown out there.
Seemed like you were banging your head against the wall trying to
work this out.
What I intended was something that wouldn't dramatically change what
was returned from the methods (you would get SeqFeature::Similarity
objects back). Hilmar has a point, though; if checks are performed
to see if the HSP is-a SeqFeatureI then there will be problems (as
the failed tests probably show).
> Would there be any problem with leaving HSPI as a SimilarityPair and
> having GenericHSP::new as:
> ...
> This gives a 1.43x speedup. (Simply overriding methods gives only a
> 1.14x speedup.)
I don't think it's worth that much effort really. There are other
ways to go about this, such as your and Aaron's suggested pull
parser, the hash-based approach, etc., which may be better. My
concern is trying to maintain API in the current set of classes
unless (as pointed out, again, by Hilmar) there is a tremendous
advantage to making changes that break the current API. So far,
sorry to say, it's debatable whether a 1.5-fold increase in speed
along with even small API changes is worth all the effort you are
putting into it. I don't think changing what's already present in
the current SearchIO modules will accomplish much.
That being said, the nice thing about SearchIO is that you could
introduce new SearchIO::* modules using your own custom handler/
Search class combinations to work alongside the current ones; that
way everybody has an option (use the old slow more OO ones vs. the
new fast hash-based ones). There, they may choose to use a new API
for the speed advantages. Make it easier for them to make the right
choice i.e. Damian Conway's affordances.
You may not even have to use a handler, and you could even build your
own Search interface classes to tailor-fit your specific needs.
There's a lot of freedom there, which can be a dangerous thing.
Those SearchIO classes that get the most usage will likely eventually
lead to deprecation of the ones infrequently used/maintained. This
is the current idea of Lincoln's Bio::DB::SeqFeature, which I believe
is intended to eventually replace Bio::DB::GFF. When everybody
realizes that GFF3 works better with Bio::DB::SeqFeature, eventually
Bio::DB::GFF likely will no longer be actively maintained and
eventually deprecated.
Remember, your SearchIO modifications do not have to be included in
this release of BioPerl, so don't rush them to make a release. We
could feasibly have 1-2 extra dev releases before v1.6, maybe more.
Rushing to make a release was one of the initial problems with
Bio::SeqFeatureI (I think) in the first 1.5 release. Please correct
me if I'm wrong there, Hilmar.
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list