[Bioperl-l] GenBankParser comparison to bioperl parser
Lincoln Stein
lstein@cshl.org
Fri, 13 Sep 2002 12:30:38 -0400
I second Aaron's opinion. It's the *method* calls that hurt. If you are
willing to throw away subclassability and do the inner loop parts
using direct function calls, then it's a performance win.
Lincoln
On Thursday 12 September 2002 01:41 pm, Aaron J Mackey wrote:
> [ trimmed the reply-to lines a bit ... ]
>
> On Thu, 12 Sep 2002, Hilmar Lapp wrote:
> > I'm sure that some of the parsing logic can be substantially improved
> > both in readability and speed, but honestly I'd be very surprised if
> > even the ultimately best regexp combined with the ultimately best
> > parsign logic can speed up the whole thing by a factor of more than
> > 2-3 fold. It's the object tree construction that costs you the order
> > of magnitude.
>
> Yes (see pICalculator thread to see a little simple benchmarking on
> SeqIO::fasta vs. pure-perl raw parsing - summary: 24 seconds vs. 0.5
> seconds to read a 25000 sequence protein database).
>
> I don't believe it's object *construction* (i.e. malloc-ing new memory) so
> much as all the function calls that are happening. Having a pool of
> objects is not going to help this at all (in fact, Perl is already keeping
> pools of SV's around for you to use, so you're just duplicating the effort
> if you go that route). I repeat: look at the function calls, and all the
> @ISA tree-walking ...
>
> -Aaron
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================