[Open-bio-l] OBDA redux?

Raoul Bonnal bonnal at ingm.org
Fri Nov 18 09:35:56 UTC 2011


Dear all, 
Would be possible to have a test dataset and clear requirements,
functionalities? Not a huge doc, just few points for benchmarking.



On 17/11/11 18.11, "Pjotr Prins" <pjotr.public41 at thebird.nl> wrote:

> On Thu, Nov 17, 2011 at 02:39:49PM +0000, Peter Cock wrote:
>>> +1.  This will only get worse, with the projections for upcoming HiSeq
>>> upgrades, it is possible 1-2 channel runs would hit that limit.
>> 
>> That's a useful scale to aim to cover in profiling then, 100M to 500M
>> records. Jason, do you have any more details about the slowdown
>> you found with SQLite? For this use case we want to write the index
>> once, and read it many times. I found it is quicker to populate the
>> offset table before creating the index - perhaps you were seeing the
>> index being updated while adding records?
> 
> I have also found that hammering SQLite quickly deteriorates
> performance. Rather too quickly in fact. Don't forget that SQL is
> inherently slower that 'simple' indexers. Also SQLite is a convenience
> library, rather than a library designed for optimized performance. We
> used to run sleepycat/bdb for that reason, now it is Tokyo/Kyoto
> cabinet. 
> 
> In the (rather) near future we will be looking at parallel feeds from
> multiple machines, to keep it somewhat interesting. Hadoop has
> indexing support. In fact, Hadoop should be ideal for indexed sequence
> information, though I have not used it. Still, when the time comes, I
> am kinda interested in parallelized NoSQL solutions for scaling up.
> Hadoop kills me because of its complexity. I hate complexity (one
> reason I have tried to avoid SQL servers).
> 
> BTW 500M records takes significant RAM for an in-memory index. Quite a
> number of solutions, to retain their performance, have to have the
> indexes in memory. 500M records now, will grow to 500G records soon.
> Just a thing to keep in mind. I would opt for a non-RAM solution.
> 
> Pj.
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l






More information about the Open-Bio-l mailing list