[BioPython] [bip] [OT] Revision control and databases
Kevin Teague
kteague at bcgsc.ca
Fri Oct 24 18:32:41 UTC 2008
>
>> I think this is a big issue for bioinformatics. How is it possible
>> that nobody
>> has never tried to implement such a functionality for databases
>
> Databases (DBMS, to be picky) are a general-purpose solution for many
> different kinds of problem. Revision control is an inhomogeneous
> problem
> with no optimal solution that can be implemented in many ways and
> not only
> using DBMS. There are plenty of revision control examples
> implemented in
> databases, and the examples that first come to mind in Python for me
> are
> content management systems such as Zope and Plone. I think that BASE
> implements one, but it's a long time since I looked at it.
The default file storage for Zope Object Database (ZODB) appends all
new database writes, keeping older transactions on disk (similar to
the way PostgreSQL works). Back in the day (circa 2000) Zope 2 exposed
this database-level feature at the application level in the Zope
Management Interface (ZMI). So you could see all past writes to the
database, and try and revert back to an older one if desired (using
the "undo" tab of the ZMI).
Problems with this approach included using sysadmin tools on the
database could break application behaviour. e.g. lets say you had a
"Document" object and a "Page Counter" object, you would wish to be
able to view older versions of Documents, but only care about the
current state of the Page Counters. However, if your Page Counters are
changing like crazy and taking up tonnes of disk space and generally
slowing down queries against the history of the database, there was no
way to say "delete all outdated ephemeral Page Counter versions, but
keep Document-related transactions" (especially since a Page Counter
change and a Document change often commited in the same transaction).
ZWiki exposed older revisions using this feature, and the accepted
practice was to put each wiki into it's own database so that other
forms of database maintenance didn't accidently blow away your wiki
history ... it wasn't so pretty :P
You also had problems reverting back to just a specific revision, for
example if you were in Revision 3 and you had changes in Revision 1
that you wanted to go back to, but you'd made changes in Revision 2
that referenced Revision 1, then you first had to step-back to
Revision 2 before you could revert back to Revision 1. Even though
Revision 2 also contained a bunch of changes that you didn't want to
revert, that you would then manually need to later re-apply. Ug!
Zope 2 also had a Version object, you could poke a button in the UI to
start a new "transaction" and then start making changes to code
+content in the database. This was just implemented as a long-running
transaction - from the point of starting to commiting a transaction
could sometimes last for a whole month :). The problem being that when
you finally wanted to commit the transaction to roll-out new features
on a web site, if there were any conflicts from changes that happened
you were hosed and would end-up copying those changes into a new
transaction based off the latest database version and commiting that.
It wasn't pretty :(
It has long since been acknowledged by Zope developers that exposing
database level features at the application level is a Bad Thing(TM)!
Today there is a whole plethora of products for Zope that do some form
of versioning, but they are all implemented at the application level.
There is a whole plethora of products because there are many ways to
do versioning, and the choices of how versions are managed is really
best left up to the specific application. Some of these products
provide reasonable APIs for implementing specific versioning within a
specific platform - e.g Plone has a package called plone.app.iterate
and it has APIs that use standard versioning terminology (checkin,
checkout, working copy) for example:
class ICheckinCheckoutTool( Interface ):
def allowCheckin( content ):
"""
denotes whether a checkin operation can be performed on the
content.
"""
def allowCheckout( content ):
"""
denotes whether a checkout operation can be performed on the
content.
"""
def allowCancelCheckout( content ):
"""
denotes whether a cancel checkout operation can be performed
on the content.
"""
def checkin( content, checkin_messsage ):
"""
check the working copy in, this will merge the working copy
with the baseline
"""
def checkout( container, content ):
"""
"""
def cancelCheckout( content ):
"""
More information about the Biopython
mailing list