[Bioperl-l] Query Unigene title from input a ACC number / BioPerl
Object Creation
Jamie Hatfield (AGCoL)
jamie at genome.arizona.edu
Tue Mar 25 10:02:41 EST 2003
The blessed hash was actually something I was planning on trying, I just
wasn't sure if that was a "sanctioned" method of speeding up my code. I
haven't been around to hear all the discussion re:
speed/flexibility/solid object model, so I didn't realize this topic was
becoming a dead horse. :-)
Another area I was curious about: my fpc module ISA MapIO, so when
reading in newlines, it uses the _readline function. Are there any
plans to buffer this or do we assume that the os/hardware does a good
enough job as is? Also, What was the motivation for abstracting this
away? I mean, I assume you're saying that there is a significant
performance hit in perl when calling methods (more so, I assume, than
other programming languages).
More than half of the time spent reading in a fpc file has ended up in
the _readline method, but it really doesn't take that long to read the
file in if you do it yourself with open, <>, close and such. I'm just
trying to find a good way to keep within the object model, but still
make this a useable object.
I really am not trying to be argumentative/critical. Just trying to
make it good and make it fast.
Is there a developer paper/primer that I should read that has a lot of
this discussion in it?
Thanks for your help and advice.
> -----Original Message-----
> From: Jason Stajich [mailto:jason at cgt.mc.duke.edu]
>
> On Tue, 25 Mar 2003, Jamie Hatfield (AGCoL) wrote:
>
> > Maybe it's just me, but I've never been too pleased with BioPerl's
> > ability to handle large amounts of data like these unigene clusters.
> > You all might remember I recently proposed a FPC module for
> reading in
> > FPC data files. Well, that is still in progress, but it is
> DOG slow,
> > and the only reason I can seem to make out of it is that
> object creation
> > is a bear.
> >
> > I would really like some input myself, from the BioPerl
> experts about
> > what I can do to speed up the creation of say . . . 100k
> objects? :-)
> >
> You have to take a different approach then. We've gone back
> and forth on
> this a lot wrt to speed and flexibility and a solid object model.
> Apparently Perl doesn't make it easy to have all three.
>
> You can get around some of the problems by instead of
> building things with
> new, you bless a hash and then call some methods to push the data in.
> This prevents the walk-up-the-tree for inheritance that
> happens on every
> new() call which is the main bottleneck. We do this with features and
> locations in the genbank parser right now to get a modest performance
> gain. It is still an area that we are trying to rethink and improve.
>
> I think we want to also move more in the realm of event based parsing
> which would allow you to attach a listener which would only
> catch certain
> events and perhaps wouldn't need to actually create objects
> for certain
> quick and dirty tasks. But the framework for this needs to
> be laid pretty
> explicitly to make it really work.
>
> I believe Ensembl hit this perf problem and went with a
> simplier object
> initialization scheme to buy them the performance they
> needed. It means
> that you have to code up more things when you inherit from an object
> (and have to remember to update all child classes when every a parent
> class changes) but you get some performance increase.
>
> -jason
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
More information about the Bioperl-l
mailing list