[Bioperl-l] Comparing DB_FILE and SDBM

Jason Stajich jason at cgt.duhs.duke.edu
Fri Aug 13 13:59:09 EDT 2004


You could just write a little script which timed how long it takes just to
read through the whole primate division without doing any DB stuff - just
see how long it takes to go through the whole file, maybe see how many
records there are in it.

If you time this you'll get a baseline of how long it will take to index
.... Those files are pretty large from the primate div so I don't know if
that is the problem or the memory/disk requirements when the index file
gets really big.

If you run top when you are doing the indexing are you seeing mostly perl
CPU or system CPU - is it disk I/O that is killing you or is it running
full tilt in perl?

-jason
On Fri, 13 Aug 2004, Mike Muratet wrote:

>
>
> On Thu, 12 Aug 2004, Josh Lauricha wrote:
>
> > On Thu 08/12/04 13:44, Mike Muratet wrote:
> > > Greetings
> > >
> > > I did a comparison myself of Bio::Index::GenBank between DB_FILE and SDBM
> > > on the latest version of the files from the Genbank primate division using
> > > a Compaq with 376K of memory and a 2.4GHz Pentium 4 Xeon. I used the
> >
> > Wow, now thats a machine in desprate need of a memory upgrade.
> >
>
> Well, yes. But it's all I've got.
>
> >
> > How loaded was that machine? (I'm assuming just the tests.)
> >
>
> I had some X sessions going to other (larger ;-) ) machines, but nothing
> intensive.
>
> > >
> > > A negligible difference. Has anyone tried to compare the libraries (or
> > > knows where someone has?)
> >
> >
> > I have a feeling that perl yeilds its timeslice for whatever reason when
> > switching from a library to perl-code, making libraries that do a lot of
> > function calls (such as XML parser and DBs) very slow in terms of
> > latency.
> >
> > Can anyone confirm or deny this?
> >
>
> If anyone has any experience(s) to relate regarding the indexing of big
> portions of Genbank, I'd like to hear how they did it. Should it really
> take days?
>
> Thanks
>
> Mike
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list