[DAS2] Fwd: DAS Write Followup

Tue Dec 14 01:48:25 UTC 2004

Begin forwarded message:

From: Ed Griffiths <edgrif at sanger.ac.uk>
Date: September 28, 2004 5:43:56 AM PDT
Subject: Todays meeting.

All,

I have attached a summary and a fuller set of notes about where we seem 
to have
got to. [inline below]

There is a lot of work to do in formalising and filling out the details 
but we
seem to be making some progress.

Ed
-- 

[The following is Ed's "DAS-WRITE-FOLLOWUP.email" document.]

================================================================
DAS 2.0 Write Back

HTTP protocol issues
--------------------

There was an idea that we would try to map the basic http GET, POST, 
DELETE
etc.  to meaningful writeback/locking actions, we didn't discuss this in
detail except to say that no one was keen to artificially shoe-horn 
write-back
operations into this framework.

** We need to sort out which operations are GET/POST/PUT/DELETE.

We have not yet discussed how we're going to do the actual editing.  It 
might
be by PUT/DELETE calls on the elements, or through a unified diff 
POSTed to
the server which contains a list of all the changes.

(Which reminds me, the DAV spec also support the HTTP operations of 
MOVE and
COPY.  These allow the versioning system to keep version information 
even
after splits and renames.)

There was some discussion of how you implement write-back, which 
requires
state, within the stateless http protocol. Thomas raised webdav as a 
possible
way to go. Gregg said that Brian (surname ?) has investigated this to 
some
extent but is not currently working on it. No one else has great 
experience of
it but Thomas said that it provides a protocol on top of http and this 
made
the idea of using unpalatable because it would mean that all clients 
would
have to use it if they want to talk DAS2. Andrew added that "The thing 
about
DAV is that it requires several new HTTP level commands, like LOCK and
PROPFIND.  DAV seems to provide mechanism but not policy and without 
some
experience we decided to pass on it for this version."

The concensus was that we should not use webdav but produce our own 
protocol
for doing write-back.

Locking
-------

What should be locked ?

After quite a bit of discussion about the granularity of locking that 
should
be allowed the concensus was that the system would become far too 
complex (=
unusable/unimplementable) unless we went for a system like Otter where 
locking
was on the basis of regions. The client would ask for a lock on a 
region,
giving start/end coords and every feature located in that region would 
be
locked.

(Note that this implies that we only lock _locatable_ objects, this is
something we need to explore further, e.g. how do "types" get editted, 
how do
genes that not located get editted etc. etc. ?)

The policy that James adopts and seemed sensible to everyone was that 
unless
an object is fully contained within the locked region then it is not
locked. If the user wants to edit that object then they have to extend 
the
region that they want to lock to include that object.

Lock operations

There should be an operation that allows the client to find out from the
server which regions are already locked so that annotators can be given 
this
information before they request a lock.

We discussed when the client should get the lock, i.e. before the 
editting
starts or once the user has a set of edits they want to save. There were
strong advocates for both models but in the end it was thought that 
locking in
advance is better otherwise an annotator may spend some time editting 
only to
discover that they cannot save their edits, i.e. lock conflicts are 
hard to
deal with without locking in advance.

We also need a request which allows the client to find out which types 
are
edittable on a per user basis, this will allow the annotator to see in 
advance
what they are allowed/able to edit.

We discussed authorisation, which given that we are using http will 
have to be
done for each lock/edit operation. It was assumed that all this will be
straight forward as database authorisation and authorisation via http 
are
known and solved problems.

The conclusions in the locking discussion are:
    - pessimistic locking on a region specified by the curator.
       We will not support locks on a specific feature, feature
       type, etc.

    - the lock request is POSTed to a new URL, listed in the
        data source document and probably named something like
         .../lock/new

    - the request will include things like name (or is that
        provided via some other authentication means?), region
        to lock, and time for lock

    - if no error, the return include the lock URL (probably
        something like ".../lock/lock12345"), and length of
        time actually granted

    - a GET on the lock URL returns info about who locked it,
        the time remaining for the lock, and region

    - a POST of some sort to the URL is used to renew the lock,
        commit the changes or revert the transaction.  We think
        it should also be able to expand the locked region.

    - the data source document will also include a URL
        used to get the list of current locks

    - breaking locks is outside the scope of the spec.  Eg,
        it might only be done by people who can log into
        the database system.

Versioning (but see later)

We also talked about tracking all change (just like a full-blown version
control system would) but decided again it was too much to deal with 
now.

** What we do need to decide is on how a client asks for a specific 
version of
** a feature, or even if this is possible ?

Timeouts

Clearly we need to be able to timeout locks: clients may disappear 
without the
server being aware of this.

Andrew suggested that when the client requests a lock, the response 
will say
how long the lock will last and also returns a url which the client can 
use to
query/unlock the lock. Everyone agreed this seemed a sound idea.

It was agreed that the "timeout" period applies to how long the client 
has
been idle, not how long in total it has been connected. The lock url 
could be
used by the client to signal that it was still active without the need 
for
some artificial database operation.

Transactions

There was some discussion about whether these are necessary at all. 
James uses
transactions to write the annotators data back into the database and 
said that
this was a vital element as it enabled the annotator to be certain that 
their
edits had made it into the database.

In James system transactions are transparent to the client, they issue 
a save
and the save is done via a transaction. There is no prolonged 
transaction
spanning several http requests. This was seen as a good model otherwise 
the
client will have to understand too much of the locking/transaction 
model of
the underlying database.

I think everyone agreed that write back should be done via a single 
request so
that transactions did not get spread across multiple http request which 
looks
intractable.

Proxies

There is a problem with writing back data when there are proxy servers 
between
the client and the database. I'm not qualified to explain all of this, 
perhaps
Tony or someone could fill this section in. I assume the problem 
relates to
making sure the proxy actually sends the data back to the database and 
doesn't
cache it causing timeout problems etc.

Andrew said:

"It's the other way around.  A proxy may cache GET requests, because a 
GET is
not supposed to affect the server.  A POST to some other URL may affect 
the
first URL but a proxy won't know there was a change.

I mentioned this might be a problem.  I just did some research and 
found this
overview, http://www.mnot.net/cache_docs/

It describes when proxies are allowed to think something is cacheable.

1.  If the response's headers tell the cache not to keep it, it won't.
2.  If no validator (an ETag or Last-Modified header) is present
    on a response, it will be considered uncacheable.
3.  If the request is authenticated or secure, it won't be cached.
4.  A cached representation is considered fresh (that is, able to
    be sent to a client without checking with the origin server) if:
    o It has an expiry time or other age-controlling header set,
       and is still within the fresh period.
    o If a browser cache has already seen the representation, and
       has been  set to check once a session.
    o If a proxy cache has seen the representation recently, and it
       was modified relatively long ago.
   Fresh representations are served directly from the cache, without
   checking with the origin server
5. If an representation is stale, the origin server will be asked
   to validate it, or tell the cache whether the copy that it has
   is still good.

So it looks like we won't stumble into cache problems by accident.  
Someone
has to enable a validator tag before the cache comes into play.

One thing to consider though is how we're going to do user 
authentication.
Use an existing HTTP username/password scheme or something outside of 
that?
If ours is independent of HTTP then we have to be more careful about 
how to
deal with caches.

If ours is based on HTTP and we want to play nice with caches then 
clients
shouldn't include password information to publicly accessible data as 
that
won't be cached.

But I wouldn't worry about this.  It's most theoretical and best to 
wait until
it becomes an actual problem.

Data issues
-----------

We discussed version numbering of objects. James has unique ids for all
edittable objects with version numbers as well. This enables him to keep
numbered versions of an object and retrieve the latest version for 
editting
but also to go back to previous versions if required. It may be that we 
need
the client to be able to ask for previous versions.

How does the client know which things can be editted ?

In James system James system there are edittable and non-edittable data
streams.  It was felt that this is not flexible enough for DAS. 
Therefore the
client needs to be able to discover what objects within a region can be
editted.

Current suggestion is that when the server returns what types it can 
serve,
the type should include a flag to say whether this type from this 
server can
be editted. If the type can be editted then this says the annotator can 
edit
_all_ objects of that type from that server. This would obviate the 
need for
every object to be flagged with whether it was edittable, something 
that none
of us thought was a good idea. This mechanism will enable the client to 
show
the annotator which objects are edittable (note that the set of objects 
that
can be editted could change from user to user depending on their 
database
permissions).

How would the client lock stuff in practice ?

1) The annotator needs to know up front what they can edit so the 
client would
be able to issue a request to the server(s) which would tell it what 
objects
were edittable.

(Andrew made a suggestion here that said something like: two urls are 
supplied
for database information for POST, there are urls for features/regions, 
server
will say "yes I have a lock, no I don't" makes it extensible can anyone
remember what he said in more detail ?)

2) The annotator then indicates what they want to lock and the client 
issues a
request to the server that says "Am I allowed to lock these actual 
objects ?",
server replies yes or no.

3) If stage 2) was successful then the client can go on to issue a 
request to
actually lock the data.

So the mechanism that seemed to make sense was for the server to serve 
up
everything with _no_ indication at the object level whether something is
edittable. The client can then discover which of the objects it is 
displaying
can be editted.

It was agreed that the annotator would be allowed to edit objects 
located on
an assembly but not actually edit the assembly itself.  There is a clear
distinction between moving features around an assembly and actually 
changing
the assembly itself !

Conclusion
----------

We covered a lot of ground in the meeting and a lot of important points 
were
discussed. There are still a lot more things to firm up. There will be 
at
least one or two meetings today and over the weekend to do this. Gregg 
and Ed
can meet on Monday morning to try to write all this up so that we have a
proposal that everyone can then shoot down/applaud.