[Bioperl-l] SearchIO speed up
Sendu Bala
bix at sendu.me.uk
Mon Aug 14 10:02:30 UTC 2006
Chris Fields wrote:
> ...
>> My proposal involves the "chunks" being unparsed, raw text "blobs", that
>> are essentially blessed into a package that does the parsing only when
>> necessary (and even then, might choose different parsing strategies, based
>> on what's been asked for). Thus a potentially large amount of parsing and
>> storage is skipped. Additionally, you now have the option of not even
>> storing the blobs in memory, just file seek pointers (requiring temp.
>> storage for streaming pipe data sources), and thus can process very large
>> reports without consuming memory (currently a problem).
>
> Using file pointers is a great touch. Sendu has a slight aversion to temp
> files but he has already indicated other ways around this.
I'm in the midst of implementing an 'Aaron'-style pull-parser which I
have called PullParserI. My current solution for piped input is:
'... The other thing you will need to decide when making a chunk is how
to handle piped input. A PullParser needs seekable data to parse, so if
your data is piped in and unseekable, you must decide between creating a
temp file or reading the input into memory, which will be done before
the chunk becomes usable and you can begin any parsing.'
I don't think its really possible to avoid this initial 'read everything
in first' step, unless anyone has any bright ideas?
More information about the Bioperl-l
mailing list