[Bioperl-l] SearchIO speed up

Fri Aug 11 01:32:01 UTC 2006

I took a quick gander at the SwissKnife code; very nice, but quite long:

http://swissknife.sourceforge.net/docs/

Perl6 uses parsing expression grammers and rules, so you could build up your
own custom grammers for parsing files.  That would come in very handy here.
Don't know how much of this is implemented or available in Pugs but I may
give it a try sometime.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Thursday, August 10, 2006 6:46 PM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] SearchIO speed up
> 
> > So the only lazyness you invoke is the object instantiation (but you've
> > already done all the parsing).
> >
> > My proposal involves the "chunks" being unparsed, raw text "blobs", that
> > are essentially blessed into a package that does the parsing only when
> > necessary (and even then, might choose different parsing strategies,
> based
> > on what's been asked for).  Thus a potentially large amount of parsing
> and
> > storage is skipped.  Additionally, you now have the option of not even
> > storing the blobs in memory, just file seek pointers (requiring temp.
> > storage for streaming pipe data sources), and thus can process very
> large
> > reports without consuming memory (currently a problem).
> 
> This approach is an excellent one, but not all file formats lend
> themselves to
> it. BLAST results have a semantically hierarchial layout, and the BLAST
> XML
> report syntax matches that layout, so the approach is well suited.
> Traditional
> BLAST reports are pretty similar too. ie. most of the data for a low-level
> object is encapsulated within a certain part of the input file.
> 
> However, this may not be true for other formats, perhaps HMMER reports,
> where
> "HSP"-related info may be spread across multiple sections of the file.
> 
> But of course, this doesn't prevent us using the approach where suitable,
> and
> using the "slow" method otherwise.
> 
> --
> Torsten Seemann
> Victorian Bioinformatics Consortium, Monash University, Australia
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l