[Bioperl-l] SearchIO speed up
Ewan Birney
birney at ebi.ac.uk
Thu Aug 10 18:55:46 UTC 2006
On 10 Aug 2006, at 18:39, aaron.j.mackey at gsk.com wrote:
>> ...Except I need to know if the community considers the speed problem
>> solved or not. More radical changes will make SearchIO even
>> faster, eg.
>> Chris Fields and Jason (if I interpret the Project priority list item
>> correctly) have suggested an end to individual Hit and HSP objects,
>> which become just data members of a Result-like object. Ideally I
>> don't
>> want to go down that route because we lose quite a bit of OO power;
>
> As already mentioned, a lazy-evaluation approach would also work.
>
> Jason and I did once talk about an entirely new parsing/object-
> building
> framework, based on nested grammars; in essence, the "top-level"
> parser,
> simply "chunks" the input into blobs of (minimally parsed) text that
> correspond to the top level result object. This chunk/blob is the
> input
> to the next-level parser for Hits, which in return has chunk for HSPs.
> Note that the Result/Hit/HSP "chunks" are "fat", i.e. they *are*
> the same
> Generic*I-implementing objects we're already using. Thus, if HSPs are
> never interrogated, they're never parsed; as soon as one is
> interrogated,
> it gets parsed, and so on. In such an environment, you can imagine
> flyweight objects that are built very quickly/easily (recall that many
> previous analyses of BioPerl speed problems are not related to
> parsing, so
> much as heavy-weight object creation).
>
for people's interest, this is what the SwissKnife package does as well
for swissprot (which has a trivially top level chunking strategy)
(ewan returns to his hectic life of too many balls in the air :)).
More information about the Bioperl-l
mailing list