[Biopython-dev] Storing Bio.SeqIO.index() offsets in SQLite

Brent Pedersen bpederse at gmail.com
Wed Jun 9 15:56:27 UTC 2010


On Wed, Jun 9, 2010 at 7:55 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Jun 9, 2010 at 9:55 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>
>> Having had a quick look, they are using SQLite3 in much the
>> say way as I was initially. They create the index before loading
>> (rather than after loading) and they use a single insert per
>> offset (rather than using a batch in a transaction or the
>> executemany method). I'm pretty sure from my experiments
>> those changes would speed up screed's loading time a lot
>> (probably inline with the speed up I achieved).
>>
>
> Do you fancy trying this version of screed? It seems much
> faster on medium sized FASTQ files:-
>
> http://github.com/peterjc/screed/tree/sqlite-tweaks
>
> I'm still running a few tests myself, but will pass this on to
> the screed team unless I find some regressions.
>
> Peter
>

not too much difference.

screed
------
create: 666.381
search: 51.839



More information about the Biopython-dev mailing list