[Bioperl-l] SearchIO speed up
Chris Fields
cjfields at uiuc.edu
Thu Aug 10 21:04:29 UTC 2006
...
> > You might use this same strategy have the handler return simple hashes
> > instead of objects,
>
> Yes, the main change I have made that provides the speed increase is to
> make the handler (SearchResultEventBuilder) return hashes instead of
> objects.
>
> It's a transparent change when combined with the lazy instantiation.
I agree, and may be the best way to proceed initially. There are other ways
to optimize. I personally like Aaron's 'chunk' idea using nested parsers,
which should fly; I could envision a way to take advantage of that with
Perl6's regex objects.
> > Alternatively, create a new SearchIO class (call it fastblast; okay,
> > terrible name) that doesn't use a handler and just returns hashes. I
> > think Jason pointed out previously that the handler isn't required.
>
> But I didn't see any particular harm in keeping them. Not having a
> handler might shave a percent or two off run times, but you need to
> balance speed with power and flexibility. I don't know where that
> balance lies, hence my question to the community.
Depends on the person, hence flexibility is probably the best way to go.
I'm like you in that I prefer using the various objects.
The cool thing about SearchIO is you could design a module to your liking.
The tools are there (SearchIO module, Generic* Search objects, the
handlers), you just have to know how they work together and where to
optimize. It's up to the user.
If someone wants a streamlined BLAST parser, they can build a specialized
SearchIO module that returns hashes straight out with no handler and no
internal caching (my fastblast suggestion). Or use a specialized handler to
dole out hashes (your method). Or use full-blown interleaved objects
(current implementation).
The learning curve is somewhat high if you don't have a strong computer
science background like me (the molecular microbiologist). You have to grok
how the system works, how the Handler works, the various Search* objects
that are returned, how they are implemented, etc. But...
The system is flexible if you know how to use it.
Chris
More information about the Bioperl-l
mailing list