[Bioperl-l] SearchIO speed up

Mon Aug 14 10:02:30 UTC 2006

Chris Fields wrote:
> ...
>> My proposal involves the "chunks" being unparsed, raw text "blobs", that
>> are essentially blessed into a package that does the parsing only when
>> necessary (and even then, might choose different parsing strategies, based
>> on what's been asked for).  Thus a potentially large amount of parsing and
>> storage is skipped.  Additionally, you now have the option of not even
>> storing the blobs in memory, just file seek pointers (requiring temp.
>> storage for streaming pipe data sources), and thus can process very large
>> reports without consuming memory (currently a problem).
> 
> Using file pointers is a great touch.  Sendu has a slight aversion to temp
> files but he has already indicated other ways around this.

I'm in the midst of implementing an 'Aaron'-style pull-parser which I 
have called PullParserI. My current solution for piped input is:

'... The other thing you will need to decide when making a chunk is how 
to handle piped input. A PullParser needs seekable data to parse, so if 
your data is piped in and unseekable, you must decide between creating a 
temp file or reading the input into memory, which will be done before 
the chunk becomes usable and you can begin any parsing.'

I don't think its really possible to avoid this initial 'read everything 
in first' step, unless anyone has any bright ideas?