[Bioperl-l] SearchIO speed up
aaron.j.mackey at gsk.com
aaron.j.mackey at gsk.com
Mon Aug 14 12:33:30 UTC 2006
A "pull parser" need not read everything (i.e. the entire file) into
memory, just the current/next chunk, right?
It was the current "push parser" architecture that had me thinking about
file pointers: if we're forced to make an initial pass through the entire
file to build up all the top-level objects before being able to access the
first one (as the current SearchIO does), then it would be advantageous to
minimize the memory impact of all those top-level objects with file
pointers rather than in-memory blobs.
But in a "pull" architecture, that consideration is no longer so
important.
Please forgive me if I've misunderstood what you're describing below.
-Aaron
bioperl-l-bounces at lists.open-bio.org wrote on 08/14/2006 06:02:30 AM:
> Chris Fields wrote:
> > ...
> >> My proposal involves the "chunks" being unparsed, raw text "blobs",
that
> >> are essentially blessed into a package that does the parsing only
when
> >> necessary (and even then, might choose different parsing strategies,
based
> >> on what's been asked for). Thus a potentially large amount of
parsing and
> >> storage is skipped. Additionally, you now have the option of not
even
> >> storing the blobs in memory, just file seek pointers (requiring temp.
> >> storage for streaming pipe data sources), and thus can process very
large
> >> reports without consuming memory (currently a problem).
> >
> > Using file pointers is a great touch. Sendu has a slight aversion to
temp
> > files but he has already indicated other ways around this.
>
> I'm in the midst of implementing an 'Aaron'-style pull-parser which I
> have called PullParserI. My current solution for piped input is:
>
> '... The other thing you will need to decide when making a chunk is how
> to handle piped input. A PullParser needs seekable data to parse, so if
> your data is piped in and unseekable, you must decide between creating a
> temp file or reading the input into memory, which will be done before
> the chunk becomes usable and you can begin any parsing.'
>
> I don't think its really possible to avoid this initial 'read everything
> in first' step, unless anyone has any bright ideas?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list