[DAS2] Re: New problem with content-type header in DAS/2 server responses!

Thu Nov 10 01:34:28 UTC 2005

Allen
> To be even more concise, there are two use cases being presented here:
>
> 1) DAS/2 content should be viewable in a web browser, and doing so
> requires a HTTP Content-Type header to have value 'text/xml'.
>
> 2) DAS/2 content should be viewable in a specialized DAS/2 browser, 
> and be
> able to rely on HTTP headers to determine visualization mode, as
> XML/DTD/Schema sniffing is undesireable.

A use case describes what the user wants do to, from the user's
perspective and not the implementation perspective.  Sometimes
they are the same, as when the user mandates certain technical
decisions, but that's not the case here.  The wikipedia has a goo
definition, at http://en.wikipedia.org/wiki/Use_case .

To make use cases read nicely I've found it useful to have
a name better than "the user".  There will be many users of
different aspect of a DAS system.  Some are:
   - a person making the database/DAS adapter
   - an annotator
   - a molecular biologist

The use case where talking about here is to let person X (either
an annotator or a molecular biologist) communicate with person Y.
Rather than saying "X" and "Y" I'll say "Bill" and "Jim".  Bill
send Jim an email saying "I think there's a problem with this
annotation; it looks like it's off-by-one.  Could you take a look
at it for me?"  (Make up your own explanation :)

Jim gets the email, sees the URL, and pastes it into his browser.
If Jim is an annotator this will probably be a specialized DAS/2
client.  If he's not, then more likely it will be a web browser.

Both should "do the right thing", that is, provide meaningful
information about the given entity and options for more exploration
and analysis.

This use case suggests several functional details:

   - There needs to be a way to exchange DAS details via normal
text, for inclusion in email.  DAS uses URLs so we should build
on those.  This means they'll also likely be used in generic
web pages.  Because the specific consumer of a URL isn't known
it's not possible to put a "?format=" field on the end of the
URL. Thus these URLs must not specify the format.

   - DAS/2 client (web browsers and specialized apps) should have
some way to get (and easily get) the URL for a given annotation,
region, feature type, etc.

   - specialized DAS clients (IGB) need a way for users to enter
an arbitrary DAS URL.

If one or more of these won't happen then there's no problem.
For example, if IGB etc. all don't support entering an arbitrary
DAS URL then there's no need to handle both classes of clients.

If there's no demand for direct visualization in a web browser
then there's also no problem.

I'm going to ask about the last.  The whole point of this change
is to support the ability for a generic web browser to go to a
given URL and show something of interest.

  1) who needs that?  Can any of us point to a group of people who
would use a direct web interface to a given DAS/2 URL?  If so,
why didn't it come up in earlier discussions?

  2) what can't they go to a DAS/2 web app elsewhere and from
there tell it "now link in the data from this URL." That is,
view the URL through an intermediary.

  3) why can't we tell people "stick a 'format=html' at the end
to see iT in HTML, if you want to make a web link to it, and if
the server supports HTML displays.

  4) Who wants to make a DAS/2 web app based directly on the
DAS/2 data structure?  Yes, that makes it trivial to have a first
pass web app, but that app will suck. It'll only support browsing
the server's data structure via a tree.  It won't support, say,
the ability to incorporate more or alternate records in a view,
fancy AJAX GUIs, etc.  There will be no way to merge records from
different servers because the annotation server only understands
annotations on that server.

My view now is that having the default MIME type for a DAS/2 entity
be "text/xml", for the purpose of supporting direct web browser
visualization of that entity, is not driven by a realistic use case
and is interesting mostly for technical reasons.

As such, we shouldn't do that.  We should leave the return documents
as distinct MIME types.

That leads me to the result of more research.  The relevant
spec for the MIME type for XML documents is RFC 3023, at
   http://www.ietf.org/rfc/rfc3023.txt

For commentary also see:
   http://www.xml.com/lpt/a/2004/07/21/dive.html
   http://diveintomark.org/archives/2004/02/13/xml-media-types

These say we have lots of things to worry about.  For example,
"text/xml" requires that the content-type include the charset
declaration, else the spec says to assume the document is in
US-ASCII.  There is no way for the XML itself to override that.

If we go the "text/xml" route we mandate that either:
   - all servers include a charset in the content-type
   - those that don't must only serve ASCII data.

The proper MIME type is under "application", as
    "application/x-das-*+xml"

> then the character encoding is determined in this order:
>
> * the encoding given in the charset parameter of the Content-Type
>      HTTP header, or
> * the encoding given in the encoding attribute of the XML declaration
>      within the document, or
> * utf-8.
(quoting from http://www.xml.com/lpt/a/2004/07/21/dive.html )

Apparently some ISPs, eg. in Russian and Japan, will transcode text/xml
documents at the HTTP level, ignoring the encoding information in the
XML itself.  This can lead to problems.

As the author of those commentaries says, “XML is tough.”
   http://diveintomark.org/archives/2004/07/06/tough

> The solution proposed in the referenced thread, or perhaps only on a
> conference call, is to use the Content-Type header to address (1),
> providing information to web browsers, as they are less flexible than a
> specialized DAS/2 client.  (2) is addressed using a DAS/2 specific
> X-Das-Content-Type header, e.g.

It must have been a conference call.  I don't see mention of that in
my back emails.  I'm thankful to Steve for doing the writeups.

To emphasize what I said earlier, what will happen in the case of
(1)?  Who will implement it?  What will users expect from it?  Why
can't those users go through some intermediate DAS web app to better
view that data?  Why can't we say "add a 'format=html' for interactive
viewing"?

As for (2), I don't want a new header.  I know I talk about conneg
and other neat features in HTTP but in re-reading appendix A of RFC 3023
   http://www.ietf.org/rfc/rfc3023.txt
it talks about over a dozen other solutions to the problem and why
they were excluded.  These include:

> A.10 How about using a conneg tag instead (e.g., accept-features:
>      (syntax=xml))?
>
>    When the conneg protocol is fully defined, this may potentially be a
>    reasonable thing to do.  But given the limited current state of
>    conneg[RFC2703] development, it is not a credible replacement for a
>    MIME-based solution.

In this case I'm willing to let people experiment with the idea
before baking it into the spec.

> A.9 How about a new Alternative-Content-Type header?
>
>    This is better than Appendix A.8, in that no extra functionality
>    needs to be added to a MIME registry to support dispatching of
>    information other than standard content types.  However, it still
>    requires both sender and receiver to be upgraded, and it will also
>    fail in many cases (e.g., web hosting to an outsourced server), 
> where
>    the user can set MIME types (often through implicit mapping to file
>    extensions), but has no way of adding arbitrary HTTP headers.

How much control will DAS/2 data providers have over their server?

I know I want to support people who provide data as a set of files
through Apache, though that's not driven by any use case.  (This
use case would involve a user who has different requirement than
either Jim or Bob.)  mod_mime is designed for that.  I don't know
how to add other headers for this case.

The data providers we have now have control over all the headers.
If that will essentially always be the case then adding a new
header isn't a problem.

Then again, if this is always the case then we can go ahead with
conneg since an argument against conneg is it puts more work on
the server implementations.

In this too I'll be conservative - DAS/2 pushes no new ground
for a web app development project; there should be no reason to
invent a new header.

> A.6 How about labeling with parameters in the other direction (e.g.,
>     application/xml; Content-Feature=iotp)?
>
>    This proposal fails under the simplest case, of a user with neither
>    knowledge of XML nor an XML-capable MIME dispatcher.  In that case,
>    the user's MIME dispatcher is likely to dispatch the content to an
>    XML processing application when the correct default behavior should
>    be to dispatch the content to the application responsible for the
>    content type (e.g., an ecommerce engine for
>    application/iotp+xml[RFC2801], once this media type is registered).
>
>    Note that even if the user had already installed the appropriate
>    application (e.g., the ecommerce engine), and that installation had
>    updated the MIME registry, many operating system level MIME
>    registries such as .mailcap in Unix and HKEY_CLASSES_ROOT in Windows
>    do not currently support dispatching off a parameter, and cannot
>    easily be upgraded to do so.  And, even if the operating system were
>    upgraded to support this, each MIME dispatcher would also separately
>    need to be upgraded.

> X-DAS-Content-Type: text/x-das-feature+xml
> X-DAS-Server: GMOD/0.0
> X-DAS-Status: 200
> X-DAS-Version: DAS/2.0
> ==================
>
> This also has the added benefit of already being implemented for a few
> months.  Are there objections to this solution?

Yes.  Several.

When did "X-DAS-Status" come back into the picture?  I thought
we talked about this in spring and nixed it because it doesn't provide
anything useful than the existing HTTP-level error code.  Or perhaps
it was fall of last year?  I think I remember raking leaves at the time.

More useful, for example, would be a document (html, xml, or otherwise)
which accompanies the error response and gives more information about
what occurred.

What does the "X-DAS-Server" get you that the normal "Server:" doesn't
get you?  What's the use case?

Why is the "X-DAS-Version" at all important?  What's important is
the data content.  It's the document return type/version that's 
important
and not the server version.

But I mentioned most of these over a year ago
   http://portal.open-bio.org/pipermail/das/2004-September/000814.html

In summary:
   - no support for direct web browser access to a URL, expect with a
       likely use case;
   - keep the default response in an XML format
   - change that XML content-type to "application/x-das-*+xml" instead 
of "text/*"
   - have no requirement for new, DAS-specific headers

					Andrew
					dalke at dalkescientific.com