[Bioperl-l] SearchIO speed up
aaron.j.mackey at gsk.com
aaron.j.mackey at gsk.com
Thu Aug 10 17:39:59 UTC 2006
> ...Except I need to know if the community considers the speed problem
> solved or not. More radical changes will make SearchIO even faster, eg.
> Chris Fields and Jason (if I interpret the Project priority list item
> correctly) have suggested an end to individual Hit and HSP objects,
> which become just data members of a Result-like object. Ideally I don't
> want to go down that route because we lose quite a bit of OO power;
As already mentioned, a lazy-evaluation approach would also work.
Jason and I did once talk about an entirely new parsing/object-building
framework, based on nested grammars; in essence, the "top-level" parser,
simply "chunks" the input into blobs of (minimally parsed) text that
correspond to the top level result object. This chunk/blob is the input
to the next-level parser for Hits, which in return has chunk for HSPs.
Note that the Result/Hit/HSP "chunks" are "fat", i.e. they *are* the same
Generic*I-implementing objects we're already using. Thus, if HSPs are
never interrogated, they're never parsed; as soon as one is interrogated,
it gets parsed, and so on. In such an environment, you can imagine
flyweight objects that are built very quickly/easily (recall that many
previous analyses of BioPerl speed problems are not related to parsing, so
much as heavy-weight object creation).
I happen to have such a nested parser lying around for
Bio::SearchIO::fasta.pm, but it also uses an Inline::C, yacc-generated C
parser backend (yet another experiment in trying to get SearchIO to run
faster), so really isn't ready for prime time (being entirely untested,
and probably not even finished).
-Aaron
More information about the Bioperl-l
mailing list