[DAS2] Re: Apollo and DAS/2 priorities

Mon Jan 30 23:52:35 UTC 2006

Ed Erwin wrote:
> In common.rnc in the definition of prop_list, I'd prefer to use
> the string 'mimetype' rather than the ambiguous 'type'
>
> prop_list = element PROP {
>   common_attrs,
>
>   attribute key { text },
>   ( attribute value { text } |
>     attribute href { text } |
>    (attribute type { text },
>     text )
>   )
> }*

I originally based this on the Atom specification, which used
"type" because it special cased "html", "text" and "xhtml".

There's no reason to support that diversity.  Just use "text/plain",
"text/html" and ... whatever the mimetype is for xhtml.

I've updated CVS to have "mimetype"

Bear in mind too that this is a /proposal/ for extensibility.
We've only talked about the need for string keys and string values.

There are a couple other ways to add new data to a DAS record.
I would rather like to get some real-world feedback first.

> I have some questions about the <FORMAT> elements in the sources.xml 
> document.

Oops!  That file shouldn't be there.  That was an early version and I
didn't realize it was in CVS.  Not any more.  I didn't have any
validation of it, and the inevitable skew occurred.

>       <SET type="residues" id="volvox/1/residues">
>           <FORMAT name="fasta" mimetype="text/x-fasta" />
>           <FORMAT name="raw" mimetype="text/x-raw-sequence" />
>       </SET>
>
> If multiple formats are listed, does that mean that
>   1) every object of that type is available in each of those formats, 
> or
>   2) every object is available in at least one of those formats?

The new name is "CATEGORY".  I don't like that name though.
I don't like SET either.  It's more the search query interface,
so "INTERFACE" might work.

I'm iffy about the format list here.

Here's my current thought.  The list of formats is the format names
you can pass to a query URL (that's the "features?..", "types?.." and
"segments?.." URLs).

If some of the records in the result cannot be returned in that
format then it is skipped, silently.  This only happens with features.

In which case it's more like your 2).

Is there a case where the search result will support a format
which cannot be used when retrieving a record?  If not then
the best answer is 2.

> For the DAS-specific formats like "text/x-das-type+xml",
> do we require that each server must support all of those?
>
> If so, is that made explicit somewhere?

"""All DAS features must be fetchable in the das2xml feature format.
Some DAS features may be available in alternative formats, depending
on the feature type.  These formats might be widely-used ones like
gff3 and psl or specialized binary formats for more compact 
downloads."""

"""A 'types' request returns a list of all the feature type data on the
server.  The URL for the request comes from the sources document.  It
is the 'id' attribute of the CATEGORY with type=="types".  The
returned document is of format name "das2xml" and content-type
"application/x-das-types+xml". """

"""Each segment is directly accessible through a URL.  It takes an
optional "format" query parameter.  If not specifed the default format
name is "das2xml" which returns a document with content-type
application/x-das-segments+xml and with one and only one <SEGMENT>
element."""

> In this example in sources4.xml, would you be better-off using
> URL-encoding ("/blah%3c2%3e.txt") rather than XML-encoding?
> Or is there an assumption that the necessary URL-encoding will
> be done at a later time?
>
>   <SOURCE id="human/" title="Duke Human" doc_href="/blah&lt;2&gt;.txt">

That data is in an XML attribute which means it is processed
by the XML parser.  The parser applies XML-decoding to convert
the "&lt;" back into a "<".

One of the bugs I found in DAS1 servers was in records which
contained an href like this

   <tag href="http://server?a=5&b=4" />

The "&" triggers an xml-decode.  Perl's XML parser ignores that
it's invalid XML, but Python's followed the XML spec and dies
saying that "&b" is invalid.

I believe the encoding here is correct.  I added that test case
to find out what XML processors do.  Written this way there is
no need to have an extra conversion step, eg, to URL-decode the
string.

					Andrew
					dalke at dalkescientific.com