[Biojava-l] BioSQL observations

Thomas Down td2@sanger.ac.uk
Thu, 14 Mar 2002 14:44:59 +0000


On Thu, Mar 14, 2002 at 01:24:43PM +0000, Matthew Pocock wrote:
> Keith James wrote:
> 
> > On the schema itself, was there any discussion on being able to track
> > changes to Features? I'm thinking of situations where the a number of
> > people are applying annotation to a genome and want to know "when was
> > this Feature last modified?"
> 
> If editing is needed, then you realy need to maintain version 
> information to allow rollbacks, resolution of clashes between different 
> edits, keeping views current and the rest. My first stab would be on the 
> unique key for features being an (ID,version) tuple where versions are 
> monotopicaly increasing integers, or a timestamp. (Do we need branching 
> ala CVS?) Things should link through tables by ID, but the fetch query 
> should usualy select id,max(version) - we would need nested queries to 
> do this efficiently. Alternatively, you could have each new feature 
> (including all features made by editing old features) have unique IDs, 
> and store the edit history in a seperate edits table

I think I'd rather see the `separate edit table' approach.  If
BioSQL's really going to catch on, I think we need to do as much
as possible to keep the core schema stable, and general-purpose.
But adding extra schema `modules' which work alongside the core
is fine.

> (things like, 
> feature A was modified by user X to become feature B). Would these 
> aproaches kill performance? Is this another case where the data model 
> needs to be specified by something more loosely bound than object 
> models, adaptor code or table definitions? My brain is melting.

It shouldn't hit write performance too much.  And reading
out an edit history should be reasonably easy, too.  The
only thing which would be a killer would be constructing
a view of the data at some point in the past.  That's something
which is pretty hard with all relational databases [1].

Keith: to get back to your original question...  is it true
`rollback' you're looking for, or would you just be happy being
able to answer `who's been editing this region recently' type
questions?

    Thomas.


[1] Older versions of Postgres had a feature called `time travel',
   which did actually allow you to perform a query on a database at
   some (abitrary) time in the past.  It was removed some time ago,
   though (although some of the infrastructure lives on in the
   Postgres commit-rollback mechanism).