[DAS2] Fwd: DAS Write Followup
Steve_Chervitz
Steve_Chervitz at affymetrix.com
Tue Dec 14 01:48:25 UTC 2004
Begin forwarded message:
From: Ed Griffiths <edgrif at sanger.ac.uk>
Date: September 28, 2004 5:43:56 AM PDT
Subject: Todays meeting.
All,
I have attached a summary and a fuller set of notes about where we seem
to have
got to. [inline below]
There is a lot of work to do in formalising and filling out the details
but we
seem to be making some progress.
Ed
--
[The following is Ed's "DAS-WRITE-FOLLOWUP.email" document.]
================================================================
DAS 2.0 Write Back
HTTP protocol issues
--------------------
There was an idea that we would try to map the basic http GET, POST,
DELETE
etc. to meaningful writeback/locking actions, we didn't discuss this in
detail except to say that no one was keen to artificially shoe-horn
write-back
operations into this framework.
** We need to sort out which operations are GET/POST/PUT/DELETE.
We have not yet discussed how we're going to do the actual editing. It
might
be by PUT/DELETE calls on the elements, or through a unified diff
POSTed to
the server which contains a list of all the changes.
(Which reminds me, the DAV spec also support the HTTP operations of
MOVE and
COPY. These allow the versioning system to keep version information
even
after splits and renames.)
There was some discussion of how you implement write-back, which
requires
state, within the stateless http protocol. Thomas raised webdav as a
possible
way to go. Gregg said that Brian (surname ?) has investigated this to
some
extent but is not currently working on it. No one else has great
experience of
it but Thomas said that it provides a protocol on top of http and this
made
the idea of using unpalatable because it would mean that all clients
would
have to use it if they want to talk DAS2. Andrew added that "The thing
about
DAV is that it requires several new HTTP level commands, like LOCK and
PROPFIND. DAV seems to provide mechanism but not policy and without
some
experience we decided to pass on it for this version."
The concensus was that we should not use webdav but produce our own
protocol
for doing write-back.
Locking
-------
What should be locked ?
After quite a bit of discussion about the granularity of locking that
should
be allowed the concensus was that the system would become far too
complex (=
unusable/unimplementable) unless we went for a system like Otter where
locking
was on the basis of regions. The client would ask for a lock on a
region,
giving start/end coords and every feature located in that region would
be
locked.
(Note that this implies that we only lock _locatable_ objects, this is
something we need to explore further, e.g. how do "types" get editted,
how do
genes that not located get editted etc. etc. ?)
The policy that James adopts and seemed sensible to everyone was that
unless
an object is fully contained within the locked region then it is not
locked. If the user wants to edit that object then they have to extend
the
region that they want to lock to include that object.
Lock operations
There should be an operation that allows the client to find out from the
server which regions are already locked so that annotators can be given
this
information before they request a lock.
We discussed when the client should get the lock, i.e. before the
editting
starts or once the user has a set of edits they want to save. There were
strong advocates for both models but in the end it was thought that
locking in
advance is better otherwise an annotator may spend some time editting
only to
discover that they cannot save their edits, i.e. lock conflicts are
hard to
deal with without locking in advance.
We also need a request which allows the client to find out which types
are
edittable on a per user basis, this will allow the annotator to see in
advance
what they are allowed/able to edit.
We discussed authorisation, which given that we are using http will
have to be
done for each lock/edit operation. It was assumed that all this will be
straight forward as database authorisation and authorisation via http
are
known and solved problems.
The conclusions in the locking discussion are:
- pessimistic locking on a region specified by the curator.
We will not support locks on a specific feature, feature
type, etc.
- the lock request is POSTed to a new URL, listed in the
data source document and probably named something like
.../lock/new
- the request will include things like name (or is that
provided via some other authentication means?), region
to lock, and time for lock
- if no error, the return include the lock URL (probably
something like ".../lock/lock12345"), and length of
time actually granted
- a GET on the lock URL returns info about who locked it,
the time remaining for the lock, and region
- a POST of some sort to the URL is used to renew the lock,
commit the changes or revert the transaction. We think
it should also be able to expand the locked region.
- the data source document will also include a URL
used to get the list of current locks
- breaking locks is outside the scope of the spec. Eg,
it might only be done by people who can log into
the database system.
Versioning (but see later)
We also talked about tracking all change (just like a full-blown version
control system would) but decided again it was too much to deal with
now.
** What we do need to decide is on how a client asks for a specific
version of
** a feature, or even if this is possible ?
Timeouts
Clearly we need to be able to timeout locks: clients may disappear
without the
server being aware of this.
Andrew suggested that when the client requests a lock, the response
will say
how long the lock will last and also returns a url which the client can
use to
query/unlock the lock. Everyone agreed this seemed a sound idea.
It was agreed that the "timeout" period applies to how long the client
has
been idle, not how long in total it has been connected. The lock url
could be
used by the client to signal that it was still active without the need
for
some artificial database operation.
Transactions
There was some discussion about whether these are necessary at all.
James uses
transactions to write the annotators data back into the database and
said that
this was a vital element as it enabled the annotator to be certain that
their
edits had made it into the database.
In James system transactions are transparent to the client, they issue
a save
and the save is done via a transaction. There is no prolonged
transaction
spanning several http requests. This was seen as a good model otherwise
the
client will have to understand too much of the locking/transaction
model of
the underlying database.
I think everyone agreed that write back should be done via a single
request so
that transactions did not get spread across multiple http request which
looks
intractable.
Proxies
There is a problem with writing back data when there are proxy servers
between
the client and the database. I'm not qualified to explain all of this,
perhaps
Tony or someone could fill this section in. I assume the problem
relates to
making sure the proxy actually sends the data back to the database and
doesn't
cache it causing timeout problems etc.
Andrew said:
"It's the other way around. A proxy may cache GET requests, because a
GET is
not supposed to affect the server. A POST to some other URL may affect
the
first URL but a proxy won't know there was a change.
I mentioned this might be a problem. I just did some research and
found this
overview, http://www.mnot.net/cache_docs/
It describes when proxies are allowed to think something is cacheable.
1. If the response's headers tell the cache not to keep it, it won't.
2. If no validator (an ETag or Last-Modified header) is present
on a response, it will be considered uncacheable.
3. If the request is authenticated or secure, it won't be cached.
4. A cached representation is considered fresh (that is, able to
be sent to a client without checking with the origin server) if:
o It has an expiry time or other age-controlling header set,
and is still within the fresh period.
o If a browser cache has already seen the representation, and
has been set to check once a session.
o If a proxy cache has seen the representation recently, and it
was modified relatively long ago.
Fresh representations are served directly from the cache, without
checking with the origin server
5. If an representation is stale, the origin server will be asked
to validate it, or tell the cache whether the copy that it has
is still good.
So it looks like we won't stumble into cache problems by accident.
Someone
has to enable a validator tag before the cache comes into play.
One thing to consider though is how we're going to do user
authentication.
Use an existing HTTP username/password scheme or something outside of
that?
If ours is independent of HTTP then we have to be more careful about
how to
deal with caches.
If ours is based on HTTP and we want to play nice with caches then
clients
shouldn't include password information to publicly accessible data as
that
won't be cached.
But I wouldn't worry about this. It's most theoretical and best to
wait until
it becomes an actual problem.
Data issues
-----------
We discussed version numbering of objects. James has unique ids for all
edittable objects with version numbers as well. This enables him to keep
numbered versions of an object and retrieve the latest version for
editting
but also to go back to previous versions if required. It may be that we
need
the client to be able to ask for previous versions.
How does the client know which things can be editted ?
In James system James system there are edittable and non-edittable data
streams. It was felt that this is not flexible enough for DAS.
Therefore the
client needs to be able to discover what objects within a region can be
editted.
Current suggestion is that when the server returns what types it can
serve,
the type should include a flag to say whether this type from this
server can
be editted. If the type can be editted then this says the annotator can
edit
_all_ objects of that type from that server. This would obviate the
need for
every object to be flagged with whether it was edittable, something
that none
of us thought was a good idea. This mechanism will enable the client to
show
the annotator which objects are edittable (note that the set of objects
that
can be editted could change from user to user depending on their
database
permissions).
How would the client lock stuff in practice ?
1) The annotator needs to know up front what they can edit so the
client would
be able to issue a request to the server(s) which would tell it what
objects
were edittable.
(Andrew made a suggestion here that said something like: two urls are
supplied
for database information for POST, there are urls for features/regions,
server
will say "yes I have a lock, no I don't" makes it extensible can anyone
remember what he said in more detail ?)
2) The annotator then indicates what they want to lock and the client
issues a
request to the server that says "Am I allowed to lock these actual
objects ?",
server replies yes or no.
3) If stage 2) was successful then the client can go on to issue a
request to
actually lock the data.
So the mechanism that seemed to make sense was for the server to serve
up
everything with _no_ indication at the object level whether something is
edittable. The client can then discover which of the objects it is
displaying
can be editted.
It was agreed that the annotator would be allowed to edit objects
located on
an assembly but not actually edit the assembly itself. There is a clear
distinction between moving features around an assembly and actually
changing
the assembly itself !
Conclusion
----------
We covered a lot of ground in the meeting and a lot of important points
were
discussed. There are still a lot more things to firm up. There will be
at
least one or two meetings today and over the weekend to do this. Gregg
and Ed
can meet on Monday morning to try to write all this up so that we have a
proposal that everyone can then shoot down/applaud.
More information about the DAS2
mailing list