[Bioperl-l] Query Unigene title from input a ACC number / BioPerl Object Creation

Jamie Hatfield (AGCoL) jamie at genome.arizona.edu
Tue Mar 25 10:02:41 EST 2003


The blessed hash was actually something I was planning on trying, I just
wasn't sure if that was a "sanctioned" method of speeding up my code.  I
haven't been around to hear all the discussion re:
speed/flexibility/solid object model, so I didn't realize this topic was
becoming a dead horse.  :-)

Another area I was curious about:  my fpc module ISA MapIO, so when
reading in newlines, it uses the _readline function.  Are there any
plans to buffer this or do we assume that the os/hardware does a good
enough job as is?  Also, What was the motivation for abstracting this
away?  I mean, I assume you're saying that there is a significant
performance hit in perl when calling methods (more so, I assume, than
other programming languages).

More than half of the time spent reading in a fpc file has ended up in
the _readline method, but it really doesn't take that long to read the
file in if you do it yourself with open, <>, close and such.  I'm just
trying to find a good way to keep within the object model, but still
make this a useable object.

I really am not trying to be argumentative/critical.  Just trying to
make it good and make it fast.

Is there a developer paper/primer that I should read that has a lot of
this discussion in it?

Thanks for your help and advice.

> -----Original Message-----
> From: Jason Stajich [mailto:jason at cgt.mc.duke.edu] 
> 
> On Tue, 25 Mar 2003, Jamie Hatfield (AGCoL) wrote:
> 
> > Maybe it's just me, but I've never been too pleased with BioPerl's
> > ability to handle large amounts of data like these unigene clusters.
> > You all might remember I recently proposed a FPC module for 
> reading in
> > FPC data files.  Well, that is still in progress, but it is 
> DOG slow,
> > and the only reason I can seem to make out of it is that 
> object creation
> > is a bear.
> >
> > I would really like some input myself, from the BioPerl 
> experts about
> > what I can do to speed up the creation of say . . . 100k 
> objects?  :-)
> >
> You have to take a different approach then.  We've gone back 
> and forth on
> this a lot wrt to speed and flexibility and a solid object model.
> Apparently Perl doesn't make it easy to have all three.
> 
> You can get around some of the problems by instead of 
> building things with
> new, you bless a hash and then call some methods to push the data in.
> This prevents the walk-up-the-tree for inheritance that 
> happens on every
> new() call which is the main bottleneck.  We do this with features and
> locations in the genbank parser right now to get a modest performance
> gain.  It is still an area that we are trying to rethink and improve.
> 
> I think we want to also move more in the realm of event based parsing
> which would allow you to attach a listener which would only 
> catch certain
> events and perhaps wouldn't need to actually create objects 
> for certain
> quick and dirty tasks.  But the framework for this needs to 
> be laid pretty
> explicitly to make it really work.
> 
> I believe Ensembl hit this perf problem and went with a 
> simplier object
> initialization scheme to buy them the performance they 
> needed.  It means
> that you have to code up more things when you inherit from an object
> (and have to remember to update all child classes when every a parent
> class changes) but you get some performance increase.
> 
> -jason
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> 



More information about the Bioperl-l mailing list