[Bioperl-l] SearchIO speed up

Ewan Birney birney at ebi.ac.uk
Thu Aug 10 18:55:46 UTC 2006


On 10 Aug 2006, at 18:39, aaron.j.mackey at gsk.com wrote:

>> ...Except I need to know if the community considers the speed problem
>> solved or not. More radical changes will make SearchIO even  
>> faster, eg.
>> Chris Fields and Jason (if I interpret the Project priority list item
>> correctly) have suggested an end to individual Hit and HSP objects,
>> which become just data members of a Result-like object. Ideally I  
>> don't
>> want to go down that route because we lose quite a bit of OO power;
>
> As already mentioned, a lazy-evaluation approach would also work.
>
> Jason and I did once talk about an entirely new parsing/object- 
> building
> framework, based on nested grammars; in essence, the "top-level"  
> parser,
> simply "chunks" the input into blobs of (minimally parsed) text that
> correspond to the top level result object.  This chunk/blob is the  
> input
> to the next-level parser for Hits, which in return has chunk for HSPs.
> Note that the Result/Hit/HSP "chunks" are "fat", i.e. they *are*  
> the same
> Generic*I-implementing objects we're already using.  Thus, if HSPs are
> never interrogated, they're never parsed; as soon as one is  
> interrogated,
> it gets parsed, and so on.  In such an environment, you can imagine
> flyweight objects that are built very quickly/easily (recall that many
> previous analyses of BioPerl speed problems are not related to  
> parsing, so
> much as heavy-weight object creation).
>

for people's interest, this is what the SwissKnife package does as well
for swissprot (which has a trivially top level chunking strategy)

(ewan returns to his hectic life of too many balls in the air :)).





More information about the Bioperl-l mailing list