[DAS2] Sanger/EBI trip report

Andrew Dalke dalke at dalkescientific.com
Tue Oct 18 20:33:30 UTC 2005


I visited EBI and Sanger last week to talk with the
people there about their use of DAS, the ongoing work
with the DAS/2 spec, and the future directions, including
structure DAS.

One meeting was with Andreas, the other Andreas (there are
too many Andre* in the UK - I think I need to change my
name), Eugene and Stefan.

Andreas has a service registry system.  I don't know where
it is though.  The registration includes metadata about the
server.  I would like some way for the DAS/2 server to
provide the metadata so the registry could extract most
of what it needs by querying the base server.  As Andreas
pointed out, that data could wrong or incomplete so the
registry could override it.  I mentioned the idea that
the DAS/2 spec as it is now lets the registry server
provide the top-level das/genome XML and is free to point
clients to the real databases.  This is one of the
advantages of a ReST architecture.


An interesting thing I learned was the wide use of stylesheets.
There are about 15 stylesheet types in use on the campus,
and Ensemble uses a version which is not-quite compatible.
Andreas Prlic pointed out that the stylesheet needs extensions
for 3D because the annotation styles are different than for
a 2D plot.  Thomas Down apparently has a version which puts
a color scale on a field, eg, so that better scores are shown
differently from worse scores.

Stylesheets also came up when talking with Ed (or is that the
other Ed :) and Roy.  They are developing zmap, a replacement
for fmap.  It's a C app (gtk-based using the FooCanvas to
display huge numbers of elements) designed to speak the same
xremote API as fmap.  They want annotations which can be
individually annotatable, that is, annotated on more than the type.

The example they gave was using three tracks - annotation,
transcript and homology.  They want to copy from the later two
into the first track and preserve the original color and style.
Sadly, that's what my notes say, but I don't understand it from
there.  What I took from it was the need to have different
ways to determine a style for an annotation, like on a
pre-track or perhaps per-annotation mechanism.

The obvious one which comes to mind, which we talked about
as a possibility, was to take ideas from CSS.

Ed (I think) asked about how to handle assembly data.
I pointed out the section in the spec which says it can be
fetched by asking for it in BED format.  He wanted to
know more about how to know if a given element was a clone
or a transcript.  At this point I said he needed to ask
a real scientist.  :)

James Gilbert also came by during the discussion.  He
asked about how we deal with hierarchical features, and
wanted to know more about how our data model fits with
the one in Otter.
   http://www.sanger.ac.uk/Users/jgrg/otter_xml.html
I don't know the answer to that question.

In both meetings people like that we refrained from making
new XML for everything, using "format=" instead.

Andreas et al. asked about computational services which
might take a non-trivial time.  I mentioned the solution
we talked about during BOSC where the server returns a
"202 Accepted" and a bit of XML saying "you can check on
the status at this URL but it'll probably take about 5
minutes to figure out."  The client should be able to
ask the server to halt the computation.

In general there was a good reception to the use of the
"format=" parameter, instead of making new XML formats.

It does look like we need to spend more time on the
format extensibility.  It seems much of what the UK folks
do is based on extending DAS/1 in various ways.  DAS/2
doesn't and cannot capture all of them.  I've been looking
at the ATOM spec.
   http://www.intertwingly.net/wiki/pie/RestEchoApiDiscuss
   http://atompub.org/2005/07/11/draft-ietf-atompub-format-10.html
It has a very nice way to embed data in the atom:content
field, where the data can be inline text, html, xml, or
"other", or be a link to an external href.

Along those lines, I think the Atom publication protocol
has some nice ideas to help with the writeback spec.

Ed described the locking model that they use.  It's
unchanged since last year's dicussion.  The annotators
decide on who gets a region, which is locked for that
person.  In their case it's exported into a local AceDB
instance, edited via fmap.  When done that database (as
a whole) is sent back to the main database for integration.
The region is locked, preventing resolution conflicts.

Andreas et. al mentioned an interesting annotation - annotate
a region to say it's been looked at but there are no
annotations for the region.  "This region intentionally
left blank."

I talked as well with Tony, mostly on organization issues.
One of the things he said they might want to do in the
future is a 2D image DAS.

I brought up the idea of having a DAS sprint - once the
spec w/ writeback starts to congeal, get the implementers
together in a room for a few days and work on code, then
use the experience to improve the spec.  Keepin' it real.

I talked about some of the disconnect between the DAS/2
dev folks (all in the US) and the UK folks.  The phone
conference call is at 8pm UK time and rather little of
what we talk about gets written up.  When the UK people
ask questions (cf. James Gilbert's question "Nested features?"
from Sept. 28, 2005) there's no response.  Similarly,
the DAS/1 extensions in the UK aren't written down so
it's hard to know what's useful for DAS/2.  My being in
Europe for the next few months should help a bit with
that, and I've always had a wacky schedule anyway so I'll
be in on the conf. calls (now that I'm back to easily
available broadband).  But I'm not enough of a domain
expert to be able to answer or address the scientific
points.

I've missed a few things so if anyone else here wants
to, feel free to add comments.

					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list