[Bioperl-l] GenBankParser comparison to bioperl parser

Lincoln Stein lstein@cshl.org
Fri, 13 Sep 2002 12:21:46 -0400


Well, I can take some consolation at having been a few minutes faster than 
Bio::SeqIO.

I wonder how much of John's remarkable performance is due to lazy parsing?  I 
think we'll be studying his code for some time to come.

Lincoln

On Thursday 12 September 2002 02:22 pm, John Kloss wrote:
> You guys all get up so early.  It's almost like you're in a different
> time zone :)
>
> First: if you guys could take off the gishlab cc, I'd appreciate it.
> The guys in my lab know I've been working on this for a long time and I
> just thought they'd like to know what some results were, that's why I
> cc'd them before.  I don't think they wanted all of last nights
> discussion.  My fault.  I should've just walked over to their desks and
> said "Hey, look what I did".
>
> Second: I'm happy to maintain my own code.  I wasn't looking to hand it
> off.  I use this parser everyday so it has to work.  If you all think it
> would be a nice addition to a bioperl-util or contrib directory, that's
> fine with me.  I'll field bug and usage issues.
>
> Third: If you'd like to use the code as an underlying base to Bio::SeqIO
> genbank format.  That's fine with me, too.  I'll still maintain it and
> I'd be willing to at least try and integrate it into the bioperl
> framework.  I actually like building parsers so it doesn't seem much of
> a burden to me.
>
> And the results of the GenBankParser against Lincoln Stein's
> Boulder::Genbank
>
> GenBankParser
>
> real    1m0.093s
> user    0m55.430s
> sys     0m6.820s
>
> Boulder::Genbank
>
> real    13m21.597s
> user    12m56.850s
> sys     0m27.180s
>
> Note, the times are slightly faster because I had to gunzip the
> gbbct1.seq file first and tear out the first 11 lines of the form
>
> GBBCT1.SEQ           Genetic Sequence Data Bank
>                            August 15 2002
>
>                 NCBI-GenBank Flat File Release 131.0
>
>                         Bacterial Sequences (Part 1)
>
>    19841 loci,   103009067 bases, from    19841 reported sequences
>
> because Lincoln's parser was dying on that with a substr out of range at
> Boulder::Genbank.pm line 853.  After I removed the GenBank header cruft,
> it worked fine.
>
> Lincoln, I've idolized you since I first learned how to code perl so
> this is a really big moment for me :)
>
> 	John Kloss.
>
> -----Original Message-----
> From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]
> On Behalf Of Ewan Birney
> Sent: Thursday, September 12, 2002 6:26 AM
> To: Lincoln Stein
> Cc: Elia Stupka; Ian Korf; John Kloss; bioperl-l@bioperl.org;
> gishlab@species.wustl.edu
> Subject: Re: [Bioperl-l] GenBankParser comparison to bioperl parser
>
> On Thu, 12 Sep 2002, Lincoln Stein wrote:
> > A separate repository is also fine with me, but I prefer
>
> Bioperl-contrib,
>
> > because it should not just be for utility code, and nicely echoes the
> > "contrib" directory of the X Windows Consortium code distribution.
> >
> > I'll put Boulder into a Bioperl-contrib if there is one.
>
> Deal. John --- sounds good to you?
>
> > Lincoln
> >
> > On Thursday 12 September 2002 7:20 am, Elia Stupka wrote:
> > > > I like the bioperl-util repository - "for professionals only" I
>
> think it
>
> > > > keeps diversity without freaking newbies out and we can trade
>
> code.
>
> > > Rightie-ho, easiest way out, for some silly reason I thought nobody
>
> would
>
> > > like that...
> > >
> > > Elia
> > >
> > > ********************************
> > > * http://www.fugu-sg.org/~elia *
> > > * tel:    +65 6874 1467        *
> > > * mobile: +65 9030 7613        *
> > > * fax:    +65 6779 1117        *
> > > ********************************
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
>
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>.
> -----------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================