[Bioperl-l] SearchIO speed up
Sendu Bala
bix at sendu.me.uk
Thu Aug 10 22:28:49 UTC 2006
aaron.j.mackey at gsk.com wrote:
>> As I understand your description, this is exactly what I do. My 'chunks'
>> are the hashes that are normally used to create a new Hit/HSP object.
>>
>> The initial parse of the data file results in a small number of objects
>> (Results) that contain all the data: HSP data nested in Hit data nested
>> in the Result objects. When you actually want to do something with a
>> certain hit or HSP it becomes an object, allowing you to call its
>> methods like normal.
>>
>> Or are you suggesting something that would be even better than that? If
>> so, please elucidate! :)
>
> So the only lazyness you invoke is the object instantiation (but you've
> already done all the parsing).
>
> My proposal involves the "chunks" being unparsed, raw text "blobs", that
> are essentially blessed into a package that does the parsing only when
> necessary (and even then, might choose different parsing strategies, based
> on what's been asked for). Thus a potentially large amount of parsing and
> storage is skipped. Additionally, you now have the option of not even
> storing the blobs in memory, just file seek pointers (requiring temp.
> storage for streaming pipe data sources), and thus can process very large
> reports without consuming memory (currently a problem).
Thanks, I might try out something along those lines. The problem I see
is with piped input; I wouldn't want to require temp. storage because
the user may deliberately be trying to gain speed by doing as little
disc io as possible. Then you'd have to special-case it; pointers if we
have a file on disc, stored-in-memory if piped. Maybe that special-case
wouldn't be so bad.
More information about the Bioperl-l
mailing list