[DAS2] mtg topics for Nov 28

Mon Nov 28 17:10:30 UTC 2005

Here are the spec issues I would like to talk about for today's meeting,
culled from the last few weeks of emails and phone calls

1) DAS Status Code in headers

The current spec says
>  X-DAS-Status: XXX status code
>
> The list of status codes is similar, but not identical, to those used  
> by DAS/1:
>
> 200 OK, data follows
> 400 Bad namespace
> 401 Bad data source
> 402 Bad data format
> 403 Unknown object ID
> 404 Invalid object ID
> 405 Region coordinate error
> 406 No lock
> 407 Access denied
> 500 Server error
> 501 Unimplemented feature

I argued that these are not needed.  Some of them are duplicates with
HTTP error codes and those which are not can be covered by an error
code "300" along with an (optional) XML payload.

The major problem with doing this seems to be in how MS IE handles
certain error codes.  While IE is not a target browser, MS software
may use IE as a component for fetching data.  From the link Ed dug
up, it looks like this won't be a problem.

Lincoln's last email on this was a tepid

> I give up arguing this one and will go with the way Andrew wants to do
> it. Therefore I propose the following rules:
>
> 	1) Return the HTTP 404 error for the case that any component of the  
> DAS2 path
> 	is invalid. This would apply to the following situations:
>
> 		Bad namespace
> 		Bad data source
> 		Unknown object ID
>
> 	2) Return HTTP 301 and 302 redirects when the requested object has
> moved.
>
> 	3) Return HTTP 403 (forbidden) for no-lock errors.
>
> 	4) Return HTTP 500 when the server crashes.
>
> For all errors there should be a text/x-das-error entity returned that
> describes the error in more detail.

The "x-das-error" format must have an invariant string, either an
error code or fixed text, and a possible optional explanatory text
section. Note the "should" in that last paragraph - this is optional.

2) Content-type

There was some discussion about changing the content type to
"text/xml" to support viewing DAS results in a browser.  We decided
that that wasn't a valid use case.

In doing the research for this I found that the general recommendation
for these sorts of XML documents is to put the document under  
"application/*"
instead of "text/*".

One reason is from http://www.ietf.org/rfc/rfc3023.txt

    If an XML document -- that is, the unprocessed, source XML document
    -- is readable by casual users, text/xml is preferable to
    application/xml.  MIME user agents (and web user agents) that do not
    have explicit support for text/xml will treat it as text/plain, for
    example, by displaying the XML MIME entity as plain text.
    Application/xml is preferable when the XML MIME entity is unreadable
    by casual users.  Similarly, text/xml-external-parsed-entity is
    preferable when an external parsed entity is readable by casual
    users, but application/xml-external-parsed-entity is preferable when
    a plain text display is inappropriate.

       NOTE: Users are in general not used to text containing tags such
       as <price>, and often find such tags quite disorienting or
       annoying.  If one is not sure, the conservative principle would
       suggest using application/* instead of text/* so as not to put
       information in front of users that they will quite likely not
       understand.

Another is the difference in how application/* and text/* handle
character set encodings.

We use "text/x-...+xml" - I propose changing this to  
"application/x-...+xml"

I don't think there are any objections to this.  The main objection is
to the difficulty of ploughing through all the specs related to charsets
and unicode.

3) Key/value data

As Steve pointed out, the spec is incomplete on how to handle key/value
data associated with a record.  The main problem is in how it handles
namespaces.  It mixes an internal attribute value namespace with the
xml namespace, which doesn't happen.

For example,

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <PROP ptype  = "das:note">This is a telomeric repeat</PROP>
     <PROP ptype  = "das:alias">birx28</PROP>
     <PROP  ptype = "property/entrez_dbxref"

Steve proposed using xml:namespaced attributes, like

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <PROP das:ptype="das:prop#note">This is a telomeric repeat</PROP>
     <PROP das:ptype="property/genefinder-score">29</PROP>
     <PROP das:ptype="das:prop#protein_translation"
       xlink:type="simple"

xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
     /></PROP>

I proposed using the "eXtensible" in XML, like this

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <das:note>This is a telomeric repeat</PROP>
     <some_other_ns:gf-score>29</PROP>
     <das:protein_translation"
       xlink:type="simple"

xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
     /></PROP>

Steve's concern with this was the validation.  I looked into the
Relax-NG spec and it support this just fine.

4) Standard form for key/value pairs

Furthermore, I looked into how Atom handles this.  They also allow
extensible key/value data in parts of the spec.  Quoting from an
earlier email, which I now see I only sent to Steve

> In an extensible section there are two/three categories of elements:
>   - those in the "atom:" namespace
>   - "simple extension elements" not in the "atom:" namespace
>   - "structured extension elements" not in the "atom:" namespace.
>
> Most of the "atom:" elements share a common structure.  For example:
>   - the type= attribute indicates of the contents are text, escaped
>       HTML or XHTML; or an explicit content-type like "chemical/x-pdb".
>
>   - the src= attribute indicates that the content of the element is
>       empty and to go to the given URL instead (apparently the hip
>       term for URL these days is IRL - internationalized Resource  
> Identifiers.
>       I think we only need to use URLs)
>
>
> These are not always used for all elements; if it's appropriate for a
> given field then it's used.
>
>
>  Simple extension elements are always of the form
>     <element>Content goes here</element>
> where 'element' is not part of the 'atom:' namespace.  Consumers of
> this data may treat it as simple key/value data.
>
>  Structured extension elements always have at least an attribute
> or a sub-element, so must look like
>   <element attr="xyz"> .. </element>
> -or-
>   <element> .. <subelement /> .. </element>
>
> If the element isn't known this field may be ignored.
>
> These three things provide for:
>   - a set of well-define elements, understandable by everyone
>   - a simple extension for things which can be key/value data
>   - a way to store or refer to more complex data types

5) xlink and <link>

Several places in the spec include or may include links to documents
elsewhere.  The XLink specification describes an general extensibility
mechanism for such links.

xlinks have 1 of about 4 properties, the most important are:
   - where does the link go to
   - what kind of link is it
   - what should the browser do with such a link

I personally don't understand the xlink spec well enough to want
to use it, and I haven't come across examples of it in use.  I am
wary about specs like that.

Another is to use something like the <link> element from HTML 4.0
and in Atom.  This looks something like

  <link rel="density.experimental_xray" type="chemical/x-ccp4-edm"
     href="http://blah.blah/"></link>

that is, it has:
   - a category for how the link is related to the given object ('rel')
   - an optional MIME type (use, eg, if the server has multiple ways
         to provide data for the same 'rel' category)
   - an href to the data

As implemented in Atom the contents of a <link> are extensible,
which allows people to experiment with things like mirroring.

<link rel="something" title="This is a title"
       xmlns:x="blah/blah" href="http://default>
   <x:mirror href="http://here/"/>
   <x:mirror href="http://there/"/>
   <x:mirror href="http://everywhere/"/>
</link>

In any case we need a way to provide typed links to other documents.
Such links may include:
   - link from a given feature to the versioned source
   - link from a versioned source to the lock document

6) Source filters

This comes from Andreas Prlic.

We can support metadata servers via the same <SOURCES> document
returned from the entry point to a DAS server.

However, a metadata server may also support searches, eg, to show
only H. sapiens annotations using the build 1234 assembly.

Should we make this property searching part of the DAS/2 spec, which
means everyone must support it, or should we say it's optional
but if implemented it must be done in a standard way?

Or leave it for version 2.1, once we have more experience with
DAS in real-life?  (Though we already have that experience.)

7) /regions

Could someone please explain to me the point of the /region subtree?

As far as I can tell, a region is just a type of feature.  A generic
feature is located somewhere on the genome (with respect to a given
assembly), and may also say it's on various 'region' features.

I don't see the need for a separate namespace for this.

8) Tiled queries

Do they need spec changes, or spec recommendations?

I think I've mentioned everything to be covered.

					Andrew
					dalke at dalkescientific.com