[Bioperl-l] SearchIO speed up

Chris Fields cjfields at uiuc.edu
Thu Aug 10 21:11:00 UTC 2006


And he comes down from the mount and speaks to the masses...then disappears
back into the mist...

Kidding aside, this strategy may be something to think about for other
parsers in Bioperl (such as SeqIO).  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ewan Birney
> Sent: Thursday, August 10, 2006 1:56 PM
> To: aaron.j.mackey at gsk.com
> Cc: bioperl-l at lists.open-bio.org; Sendu Bala
> Subject: Re: [Bioperl-l] SearchIO speed up
> 
> 
> On 10 Aug 2006, at 18:39, aaron.j.mackey at gsk.com wrote:
> 
> >> ...Except I need to know if the community considers the speed problem
> >> solved or not. More radical changes will make SearchIO even
> >> faster, eg.
> >> Chris Fields and Jason (if I interpret the Project priority list item
> >> correctly) have suggested an end to individual Hit and HSP objects,
> >> which become just data members of a Result-like object. Ideally I
> >> don't
> >> want to go down that route because we lose quite a bit of OO power;
> >
> > As already mentioned, a lazy-evaluation approach would also work.
> >
> > Jason and I did once talk about an entirely new parsing/object-
> > building
> > framework, based on nested grammars; in essence, the "top-level"
> > parser,
> > simply "chunks" the input into blobs of (minimally parsed) text that
> > correspond to the top level result object.  This chunk/blob is the
> > input
> > to the next-level parser for Hits, which in return has chunk for HSPs.
> > Note that the Result/Hit/HSP "chunks" are "fat", i.e. they *are*
> > the same
> > Generic*I-implementing objects we're already using.  Thus, if HSPs are
> > never interrogated, they're never parsed; as soon as one is
> > interrogated,
> > it gets parsed, and so on.  In such an environment, you can imagine
> > flyweight objects that are built very quickly/easily (recall that many
> > previous analyses of BioPerl speed problems are not related to
> > parsing, so
> > much as heavy-weight object creation).
> >
> 
> for people's interest, this is what the SwissKnife package does as well
> for swissprot (which has a trivially top level chunking strategy)
> 
> (ewan returns to his hectic life of too many balls in the air :)).
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list