[Bioperl-l] SearchIO speed up

Thu Aug 10 19:06:33 UTC 2006

aaron.j.mackey at gsk.com wrote:
>> ...Except I need to know if the community considers the speed problem 
>> solved or not. More radical changes will make SearchIO even faster, eg. 
>> Chris Fields and Jason (if I interpret the Project priority list item 
>> correctly) have suggested an end to individual Hit and HSP objects, 
>> which become just data members of a Result-like object. Ideally I don't 
>> want to go down that route because we lose quite a bit of OO power;
> 
> As already mentioned, a lazy-evaluation approach would also work.
> 
> Jason and I did once talk about an entirely new parsing/object-building 
> framework, based on nested grammars; in essence, the "top-level" parser, 
> simply "chunks" the input into blobs of (minimally parsed) text that 
> correspond to the top level result object.  This chunk/blob is the input 
> to the next-level parser for Hits, which in return has chunk for HSPs. 
> Note that the Result/Hit/HSP "chunks" are "fat", i.e. they *are* the same 
> Generic*I-implementing objects we're already using.  Thus, if HSPs are 
> never interrogated, they're never parsed; as soon as one is interrogated, 
> it gets parsed, and so on.

As I understand your description, this is exactly what I do. My 'chunks' 
are the hashes that are normally used to create a new Hit/HSP object.

The initial parse of the data file results in a small number of objects 
(Results) that contain all the data: HSP data nested in Hit data nested 
in the Result objects. When you actually want to do something with a 
certain hit or HSP it becomes an object, allowing you to call its 
methods like normal.

Or are you suggesting something that would be even better than that? If 
so, please elucidate! :)