[DAS2] writeback via diffs

Andrew Dalke dalke at dalkescientific.com
Thu Feb 9 15:53:38 UTC 2006

Summary: We've been talking about the "update via a delta" model
as an alternative to the "lots of changes to the server" model.
Deltas mean the heavy work is done in the client (or middleware),
vs. the server.

We've been looking at the writeback spec.  It doesn't handle
the case of a complex feature with a parent/part relationship.

In the current scheme that's done as a:
   - get the write lock
   - POST the new feature (parent)
   - POST the new feature (child)
   - commit on the lock

What URL does the parent record have to point to the child?
Does the database defer referential integrity checks until
the commit on the lock?  Is this a case where the POST for
that feature returns an UPDATELIST document for every unknown/
placeholder identifier in the record?  Probably.

Another solution is to ask the server "give me two identifiers
which can be used for features".  (NOTE: must do this for
either URLs or 'short ids' because the client might guess
and override an existing feature.)  Cute. But no real takers here.

BTW, does the full DAS query system support searches of the
modified version of the server?  How does the server know that
the search request comes from a client working in an editable

In talking about it we've been working on an idea we all
talked about last year; submitting a delta to the server
and moving the heavy work into the client.

That is, after the client is done locally it sends a
document which looks like

   <DELETE id="http://www/das/type/T12345" />
   <DELETE id="http://www/das/feature/exon/1" />
   <DELETE id="http://www/das/feature/exon/2" />
   <DELETE id="http://www/das/feature/contig/Ctg9" />
     <!-- this modifies an existing type -->
     <TYPE id="http://www/das/type/DEADBEEF">
       ... updated type information here ...
       <PROP key="name" value="Pa Cartwright" />
     <!-- this creates a new type -->
     <TYPE id="XXXXXXXXXX" >  <!-- see below for id discussion -->
     <!-- this updates an existing feature -->
     <FEATURE id="http://www/das/feature/F9415"
     <!-- this creates a new feature -->
     <FEATURE id="YYYYYY" type_id="http://www/das/type/T12345">

There are several things to note:
   - the <DELETE> elements, to remove existing types and features
   - the types and features are in the normal formats.
   - there is no way to update a part of a record/ the record
       is sent in full
   - new identifiers are still a problem

The use model for this is as follows, based on Otter.

   - get the SOURCES document, which will have

<CAPABILITY type="locks" url="http://www/../get_lock_info.pl" />

<CAPABILITY type="writeback" url="http://../post_updated_delta.py" />

   - get an exclusive write lock on a region
       - POST to the locks URL (and GET gets a list of the locks?)

       - only one region locked at a time (current spec allows the
          full query language; is that needed?)

       - user is authenticated via HTTP-level authentication
           (Q: allow https for any of this?)

       - optional timeout time in request; server may give shorter
           or longer timeout

       - user is allowed to edit all features in the given region

   - get all the features in that region  (because there may have
       been a commit before the write lock)

   - work with the data on the local copy of the server data

   - push the big red "COMMIT" button

   - server POSTS the delta to the server
       - user authentication again
       - also sends a lock-id or a nounce so the server can
           double-check that there wasn't some other change

   - server checks payload for referential integrity

The problem is the need for a URL.  We've come up with two

   1. ask the server for things which can be used as identifiers.
These identifiers live for the life of the lock.

   2. reserve a private URI scheme, like "das-private:" followed
by a client-defined identifier.  On upload the server maps those
into valid local identifiers.  To work correctly for the client
the response document would need to contain mapping from private
identifiers to server identifiers.

The current spec uses the latter mechanism but does not specify
how the placeholder identifier is generated.  The mapping is
essentially the "UPDATELIST" from the current spec, though with
no need to support the status field on a per item basis - it
should be an all or none transaction.

Sending a delta gets rid of the DELETE and PUT (and POST update)
methods on the server.  Not ReSTful.  It places the burden on the
client for tracking the user edits instead of in the server.
But we have a good sense that it will work and is understandable.

It maps much more closely to the current Otter use.  We don't
know how Apollo/Chado wants to support writeback.

If we decide to stay with the existing ReSTy spec then our
recommendations are:

   - there's no need to support partial updates; clients send
the complete record to the server for update

   - the query language does not need to support the full
      DAS query language; only the "region" query (based on
      Otter experience)

   - there's no current need to extend the range of a lock
       nor to extend the time of the lock.

And I don't like that "lock=" is a parameter to the feature
and types URLs which creates locks for those types rather than
performs queries.  I would rather these be new URLs.

					dalke at dalkescientific.com

More information about the DAS2 mailing list