[Biopython-dev] Storing Bio.SeqIO.index() offsets in SQLite

Kevin Jacobs <jacobs@bioinformed.com> bioinformed at gmail.com
Tue Jun 8 11:00:44 UTC 2010


On Tue, Jun 8, 2010 at 5:35 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Mon, Jun 7, 2010 at 10:10 PM, Kevin Jacobs wrote:
> > On Mon, Jun 7, 2010 at 2:23 PM, Peter wrote:
> >>
> >> Having now tried using this on some files with tens of millions of
> >> records, tuning how we use SQLite is going to be important.
> >>
> > Wouldn't a Berkeley database be much much faster for constructing
> > simple key to offset mappings?
>
> Maybe - now that I've done the refactoring on Bio.SeqIO.index() to
> allow two back ends (python dict or SQLite) trying a third (BDB) is
> much easier. Did you know BDB was used in the old OBDA index
> files? However, Python 2.6 deprecated bsddb (the Python Interface
> to Berkeley DB library) and Python is pushing people to SQLite3
> instead.
>
>
Hi Peter,

I am aware that SQLite is taking over the job of serving as the default
embedded database for Python and am in vigorous agreement with that trend.
 I use SQLite for a wide range of tasks and am extremely happy with it for
most applications.  Unfortunately, for pure key-value mapping tasks, I've
found  SQLite to be 4-10x slower than a well-tuned BDB tree, even with
batched updates and using the most aggressive SQLite performance pragmas. My
results may not be typical, but I thought I'd raise the issue given the
magnitude of the performance difference.

Best regards,
-Kevin



More information about the Biopython-dev mailing list