[Bioperl-l] SearchIO speed up
Sendu Bala
bix at sendu.me.uk
Sun Aug 20 21:56:28 UTC 2006
Sendu Bala wrote:
> Chris Fields wrote:
>> ...
>>> My proposal involves the "chunks" being unparsed, raw text "blobs", that
>>> are essentially blessed into a package that does the parsing only when
>>> necessary (and even then, might choose different parsing strategies, based
>>> on what's been asked for). Thus a potentially large amount of parsing and
>>> storage is skipped. Additionally, you now have the option of not even
>>> storing the blobs in memory, just file seek pointers (requiring temp.
>>> storage for streaming pipe data sources), and thus can process very large
>>> reports without consuming memory (currently a problem).
>> Using file pointers is a great touch. Sendu has a slight aversion to temp
>> files but he has already indicated other ways around this.
>
> I'm in the midst of implementing an 'Aaron'-style pull-parser which I
> have called PullParserI.
I've now committed this to bioperl-live. It is Bio::PullParserI and the
first thing to implement it is my new hmmer parser,
Bio::SearchIO::hmmer_pull (for want of a better name). The API here
isn't set in stone, so certainly I'd encourage suggestions for improvement.
I've made a start on a BLASTN parser so we can see a more familiar speed
comparison, but its not ready yet. Meanwhile, see thread 'New hmmpfam
parser'.
More information about the Bioperl-l
mailing list