[Open-bio-l] OBDA redux?

Mon Nov 14 18:47:10 UTC 2011

On Nov 14, 2011, at 12:14 PM, Peter Cock wrote:

> Hi Chris,
> 
> [Did you mean to CC BioPerl-l? Should I have?]
> 
> On Mon, Nov 14, 2011 at 5:59 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Nov 13, 2011, at 6:24 AM, Peter Cock wrote:
>> 
>>> So, Chris and I seem in general agreement that an OBDA v2
>>> using SQLite but based on essentially the same approach as
>>> the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
>>> mapping record identifiers to file offsets in the original sequence
>>> files.
>> 
>> The worry I have is adhering to a specific backend (e.g. SQLite).
>> The reason I say this is b/c BDB in it's time was the go-to way
>> of storing simple index data, but that is no longer feasible for
>> very large data sets.  Who's to say something similar won't
>> happen to SQLite, or that it is the best option available?
> 
> Right now I would think SQLite is one of the best (if not the
> best) option. If supporting the old back ends is important for
> cross-project compatibility, I'm willing to have another go
> at using BDB in Biopython, but had limited success last
> time I tried.

No, I agree re: SQLite at the moment, it's probably the best option (fast, widely adopted, etc), though Jason mentioned (Tokyo|Kyoto)Cabinet also worked very well.  I would rather not paint ourselves into a corner if the 'nice-and-shiny' next thing down the road performs better and gains wide adoption. 

>> Maybe we should focus on the data storage schema, as
>> simple as it may be, then indicate the default backend
>> must be SQLite but others are allowed (maybe with a
>> mention that SQLite can be replaced by alternatives in
>> the future if needed).
> 
> It would make sense to talk about an SQL schema if
> the "other options" would also be SQL based. But they
> might not be... but certainly we should keep potential
> alternative back ends in mind.

It's probably necessary to allow for both possibilities (SQL and other).  For instance, a move to SQLite will necessitate describing the table data with SQL anyway.

>>> I hope to get BioRuby on board, they already have an OBDA
>>> v1 support so that shouldn't be too hard.
>>> 
>>> Right now I don't recall if BioJava has/had OBDA v1 support,
>>> and if they did if it was affected in their recent move to BioJava
>>> v3 (I understand from their mailing list that some older lower
>>> priority functionality has not all been ported yet).
>> 
>> I wouldn't be surprised at that, OBDA kind of lingered for a
>> while, and I'm not sure how widely adopted it became
>> (maybe others can shed light on that?)
> 
> Well, OBDA went beyond just indexing flat files - it also
> tried to standard things like remote database access.
> I don't think we every really had that side working in
> Biopython, so I am less familiar with it. I know EMBOSS
> has something fairly extensive for online databases,
> but have not checked if it uses the OBDA style or their
> own.

Right, but I wonder if that may have been one problem with the original OBDA specification, that it was perhaps overly ambitious out-the-gate.

> For now I was only planning to tackle indexing sequence
> files in this "OBDA redux".

That's a good and simpler start; the rest (remote access) fall in naturally once that is in place.

>>> Also EMBOSS are likely to be interested, certainly Peter Rice
>>> was interested in the SQLite indexing we're already using in
>>> Biopython for sequence files (i.e. what is effectively the
>>> prototype for OBDA v2).
>>> 
>>> Note that in addition to simple indexing of text files, we are
>>> already using the same simple offset + length approach for
>>> indexing binary files (e.g. SFF).
>> 
>> I think that's the general idea, that is how all bioperl data
>> was indexed, before with the Bio::Index modules and with
>> the OBDA implementations as well.
> 
> Good.
> 
>>> On the immediate practical side, I think I can edit the
>>> current OBDA website of http://obda.open-bio.org/
>>> via /home/websites/obda.open-bio.org/html on the
>>> server.
>> 
>> See below w/ regards to my thoughts on the wiki.
>> 
>>> We need to work out where the current OBDA indexing
>>> specification lives (CVS or SVN?) and perhaps move
>>> that to github. We may need a general OBF organisation
>>> account on git hub for this and any other cross-project
>>> repositories.
>> 
>> +1 to a move to github, but maybe this belongs in an
>> OBF-specific organization.
> 
> Yes, definitely under an OBF github account (not under
> Biopython, BioPerl, etc).
> 
>> And maybe we should take advantage of the simple
>> wiki or project homepage that GitHub offers and move
>> everything (docs) there.
> 
> That could work. We'd have to go through all the old
> documentation and relocate it, then we could make the
> obda.open-bio.org domain point at the github pages.

Yes, I think that's the idea.  

>>> I see there is already an OBDA project on RedMine,
>>> (Chris can you add me to that please?)
>>> https://redmine.open-bio.org/projects/obda
>>> 
>>> Peter
>> 
>> Done (last night actually, but I didn't have time to respond
>> immediately).
>> 
>> chris
> 
> Thanks,
> 
> Peter

np.

-c