[Biopython-dev] Storing Bio.SeqIO.index() offsets in SQLite

Peter biopython at maubp.freeserve.co.uk
Wed Jun 9 14:55:23 UTC 2010


On Wed, Jun 9, 2010 at 9:55 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Having had a quick look, they are using SQLite3 in much the
> say way as I was initially. They create the index before loading
> (rather than after loading) and they use a single insert per
> offset (rather than using a batch in a transaction or the
> executemany method). I'm pretty sure from my experiments
> those changes would speed up screed's loading time a lot
> (probably inline with the speed up I achieved).
>

Do you fancy trying this version of screed? It seems much
faster on medium sized FASTQ files:-

http://github.com/peterjc/screed/tree/sqlite-tweaks

I'm still running a few tests myself, but will pass this on to
the screed team unless I find some regressions.

Peter



More information about the Biopython-dev mailing list