[Open-bio-l] OBDA redux?

Thu Nov 17 14:39:49 UTC 2011

On Thu, Nov 17, 2011 at 2:13 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 16, 2011, at 2:19 PM, Jason Stajich wrote:
>
>> Not to overlly advocate for the NOSQL as I think for our purposes the jury
>> is still out. So I think it is worth benchmarking - NOSQL and SQL-based
>> systems will have dfferent overheads.
>>
>> I know when I have tried to store 100M -> 500M records in SQLite the
>> performance degrades whereas I was able to store that range of keys
>> in NOSQL db without problem.
>
> +1.  This will only get worse, with the projections for upcoming HiSeq
> upgrades, it is possible 1-2 channel runs would hit that limit.

That's a useful scale to aim to cover in profiling then, 100M to 500M
records. Jason, do you have any more details about the slowdown
you found with SQLite? For this use case we want to write the index
once, and read it many times. I found it is quicker to populate the
offset table before creating the index - perhaps you were seeing the
index being updated while adding records?

Peter