[DAS2] Sanger/EBI trip report

Tue Oct 18 22:39:30 UTC 2005

On Tue, Oct 18, 2005 at 10:33:30PM +0200, Andrew Dalke wrote:
> I visited EBI and Sanger last week to talk with the
> people there about their use of DAS, the ongoing work
> with the DAS/2 spec, and the future directions, including
> structure DAS.
> 
> One meeting was with Andreas, the other Andreas (there are
> too many Andre* in the UK - I think I need to change my
> name), Eugene and Stefan.

Hi, I'm one of them Andreases.  In this particular email I'm just
commenting on a very small number of things that Andrew is writing.

> Andreas has a service registry system.  I don't know where

That's Andreas Prlic, the other one (depending on your point of view),
not me.

> it is though.  The registration includes metadata about the

It is at

    http://das.sanger.ac.uk/registry/

(final slash is essential, it seems)

> server.  I would like some way for the DAS/2 server to
> provide the metadata so the registry could extract most
> of what it needs by querying the base server.  As Andreas
> pointed out, that data could wrong or incomplete so the
> registry could override it.  I mentioned the idea that
> the DAS/2 spec as it is now lets the registry server
> provide the top-level das/genome XML and is free to point
> clients to the real databases.  This is one of the
> advantages of a ReST architecture.
> 
> 
> An interesting thing I learned was the wide use of stylesheets.
> There are about 15 stylesheet types in use on the campus,
> and Ensemble uses a version which is not-quite compatible.
> Andreas Prlic pointed out that the stylesheet needs extensions
> for 3D because the annotation styles are different than for
> a 2D plot.  Thomas Down apparently has a version which puts
> a color scale on a field, eg, so that better scores are shown
> differently from worse scores.
> 
> Stylesheets also came up when talking with Ed (or is that the
> other Ed :) and Roy.  They are developing zmap, a replacement
> for fmap.  It's a C app (gtk-based using the FooCanvas to
> display huge numbers of elements) designed to speak the same
> xremote API as fmap.  They want annotations which can be
> individually annotatable, that is, annotated on more than the type.
> 
> The example they gave was using three tracks - annotation,
> transcript and homology.  They want to copy from the later two
> into the first track and preserve the original color and style.
> Sadly, that's what my notes say, but I don't understand it from
> there.  What I took from it was the need to have different
> ways to determine a style for an annotation, like on a
> pre-track or perhaps per-annotation mechanism.
> 
> The obvious one which comes to mind, which we talked about
> as a possibility, was to take ideas from CSS.
> 
> Ed (I think) asked about how to handle assembly data.
> I pointed out the section in the spec which says it can be
> fetched by asking for it in BED format.  He wanted to
> know more about how to know if a given element was a clone
> or a transcript.  At this point I said he needed to ask
> a real scientist.  :)
> 
> James Gilbert also came by during the discussion.  He
> asked about how we deal with hierarchical features, and
> wanted to know more about how our data model fits with
> the one in Otter.
>   http://www.sanger.ac.uk/Users/jgrg/otter_xml.html
> I don't know the answer to that question.
> 
> In both meetings people like that we refrained from making
> new XML for everything, using "format=" instead.
> 
> Andreas et al. asked about computational services which
> might take a non-trivial time.  I mentioned the solution
> we talked about during BOSC where the server returns a
> "202 Accepted" and a bit of XML saying "you can check on
> the status at this URL but it'll probably take about 5
> minutes to figure out."  The client should be able to
> ask the server to halt the computation.

This is related to something that mainly Tom Oinn here at the EBI
has been working on: Distributed Annotation with Lazily Evaluated
Computation (DALEC), a kind of DAS frontend to Taverna workflows.

    http://taverna.sourceforge.net/projects/dalec/

> In general there was a good reception to the use of the
> "format=" parameter, instead of making new XML formats.
> 
> It does look like we need to spend more time on the
> format extensibility.  It seems much of what the UK folks
> do is based on extending DAS/1 in various ways.  DAS/2
> doesn't and cannot capture all of them.  I've been looking
> at the ATOM spec.
>   http://www.intertwingly.net/wiki/pie/RestEchoApiDiscuss
>   http://atompub.org/2005/07/11/draft-ietf-atompub-format-10.html

I need to read this.

> It has a very nice way to embed data in the atom:content
> field, where the data can be inline text, html, xml, or
> "other", or be a link to an external href.
> 
> Along those lines, I think the Atom publication protocol
> has some nice ideas to help with the writeback spec.
> 
> Ed described the locking model that they use.  It's
> unchanged since last year's dicussion.  The annotators
> decide on who gets a region, which is locked for that
> person.  In their case it's exported into a local AceDB
> instance, edited via fmap.  When done that database (as
> a whole) is sent back to the main database for integration.
> The region is locked, preventing resolution conflicts.
> 
> Andreas et. al mentioned an interesting annotation - annotate
> a region to say it's been looked at but there are no
> annotations for the region.  "This region intentionally
> left blank."

Yes, this was something that confused me at first but that makes perfect
sense to me now.  Groups sometimes need to say they've looked at a
region (protein/gene/whatever) because the fact that they are explicitly
not annotating something is as much an annotation as actually annotating
something with a box.  Covering the region with an annotation saying
"there's nothing here" does not seem quite right to me.

> I talked as well with Tony, mostly on organization issues.
> One of the things he said they might want to do in the
> future is a 2D image DAS.
> 
> I brought up the idea of having a DAS sprint - once the
> spec w/ writeback starts to congeal, get the implementers
> together in a room for a few days and work on code, then
> use the experience to improve the spec.  Keepin' it real.
> 
> I talked about some of the disconnect between the DAS/2
> dev folks (all in the US) and the UK folks.  The phone
> conference call is at 8pm UK time and rather little of
> what we talk about gets written up.  When the UK people
> ask questions (cf. James Gilbert's question "Nested features?"
> from Sept. 28, 2005) there's no response.  Similarly,
> the DAS/1 extensions in the UK aren't written down so

This is not *quite* true.  The alignment and structure extensions to
DAS/1 by Andreas Prlic are well documented here:

    http://www.efamily.org.uk/xml/das/documentation/

> it's hard to know what's useful for DAS/2.  My being in
> Europe for the next few months should help a bit with
> that, and I've always had a wacky schedule anyway so I'll
> be in on the conf. calls (now that I'm back to easily
> available broadband).  But I'm not enough of a domain
> expert to be able to answer or address the scientific
> points.
> 

Me and Stefan enjoyed Andrew's visit and would certainly like to see
some sort of dialogue or collaboration or whatever may help getting
further with specifying and implementing DAS/2.

> 
> I've missed a few things so if anyone else here wants
> to, feel free to add comments.
> 
> 					Andrew
> 					dalke at dalkescientific.com

Regards,
Andreas

-- 
Andreas Kähäri
EMBL-EBI/ensembl

------{ www.embl.org }----{ www.ebi.ac.uk }----{ www.ensembl.org }------