[Bioperl-l] bioperl-db

Heikki Lehvaslaiho heikki@ebi.ac.uk
Wed, 13 Mar 2002 17:12:21 +0000


Aaron J Mackey wrote:
> 
> On Wed, 13 Mar 2002, Ewan Birney wrote:
> 
> > Heikki is I believe going to being looked into profiling Rec::Descent
> > which maybe faster (we effectively have a hand-coded LLR parser in
> > embl/genbank and I can't believe we've done a good job).
> 
> I went down this path about a year ago when Jason asked me to think about
> rewriting the FTHelper code under Parse::RecDescent.  The "naive" parser
> one first writes is full of logical specification that is very easy for a
> human to read, and make changes to, but in terms of the state machine that
> gets built, a bit redundant.  In other words, it's slow as all hell.
> "Tuning" the grammar by optimizing away lots of unnecessary states helps
> speed considerably, but it all ends up turning into something alot less
> readable (and looking more and more like the original mass of regexps).

I am afraid I have to agree. I've put together an EMBL parser which reads
in  and creates a Bio::RichSeq using >95% of the information in an entry. It
is easy to maintain but speed sucks:

Reading in 1000 EMBL entries, there is one magnitude difference in speed:

Parse::RecDescent   75.85user 0.18system 1:16.12elapsed 99%CPU
Bio::SeqIO           7.41user 0.02system 0:07.44elapsed 99%CPU

Precompiling the parser (into an ascii class file) helps only in the
startup, not in running speed.


Damian Conway promises that speed issues will be handled in v 2.0 (see the
man page), but I have no idea when it will be out.


I still like the idea of using Parse::RecDescent, but it is not (yet) a
viable option for Bio::SeqIO.


	-Heikki






> I briefly looked into writing a traditional C-based, compiled yacc grammar
> (which could then be bootstrapped in via Inline::C), but as always, my
> thesis committee somehow thinks that real experiments are more important
> than open source software :(
> 
> -Aaron
> 
> --
>  Aaron J Mackey
>  Pearson Laboratory
>  University of Virginia
>  (434) 924-2821
>  amackey@virginia.edu
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________