[Biojava-l] SQL-backed persistent Biojava sequence/feature objects

Ewan Birney birney@ebi.ac.uk
Mon, 30 Jul 2001 13:48:57 +0100 (BST)


On Mon, 30 Jul 2001, Thomas Down wrote:

> 
> I've just been taking a look at the bioperl-db schema, and
> it's certainly worth a good look by anyone who's interested
> in this project.
> 
> I would say that it's quite strongly tied to BioPerl though,
> or at least the BioPerl way of looking at things.  We should
> look quite carefully at what the requirements are for persistance
> in BioJava.  For instance, a Java-centric schema could get
> away with tricks like serializing any datatypes it didn't
> explicitly understand (I'm thinking particularly of
> Annotation-bundle data here).  That sort of thing could probably
> be piggy-backed onto the BPDB schema as an `optional extra'
> without too much trouble.

There are similar things on the Bioperl side as well -- ;) I am all for a
series of DDL files of increasing complications to allow the schema to
grow in complexity around a stable core, and not against "shove it
in" methods of raw language serialisation and/or hacky tag-value sets
(xml-styleee) for the more "just store this object" type problems. Of
course, Biojava and Bioperl wont interoperate on these data types, but we
will be able to let each project represent all its bells and whistles
but having a common core.

> 
> A rather bigger problem is hierarchical features, which I'd
> say were quite important if we're aiming for `persistant
> BioJava' rather than a more general database system.  This
> definitely does mean a new schema.  And quite possibly
> stored proceedures on the server (or something similar) to
> keep the performance good -- at least given my past experiences
> with hierarchical data in SQL.
> 

Bioperl has a half-used heirerachical scheme for features which is now in
limbo due to our split locations. In other words, if biojava wanted to add
heirarchy into the schema I would be fine too see that happen and could
provide mappings to bioperl.

(cue rambling discussion about whether complex heirarchies of features are
a good thing or not or whether they should be represented as separate
objects - take for example, "intron features" which one can derived from
exon features and therefore don't want to duplicate inside the data
storage but do want to expose programmatically. Aaaaaah. The sweet smell
of a complex design decision for us to chew over)


> Anyway, sorry for rambling on.  I think the point I'd like to
> make is that there are two slightly different problems here:
> 
>   - A general, lightweight, database mechanism which can
>     be shared between different projects.  BPDB looks like
>     a reasonable schema for this sort of thing.
> 

Sure

>   - A system tuned for a particular object model, trying to
>     get as close to that model as posssible.  This should give
>     `persistant objects' which behave extremely closely to,
>     for example, the normal in-memory BioJava Sequences.


I am going to stick my neck out and say with a minimum amount of give and
take, Bioperl and Biojava can map to the same relational data model for
both their object models and furthermore this is a "good thing" to keep to
the two projects from drifting.


I'm happy to be flexible here. After all, there is more than one way to do
it! Let's see how far we can get before we have to get the boxing gloves
out....


> 
> It's worth being clear about which of these is being addressed
> before making too many commitments here.
> 

Shall we see how far a common schema can take us? I wont force it if it
wont go, but it is worth making the effort to stay on the same rough data
model - I think it would benefit both communities





ewan




-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------