[DAS] RFC: REST example for DAS 2.0

Andrew Dalke dalke@dalkescientific.com
Wed, 15 Jan 2003 00:46:19 -0700


                    REST example for DAS 2.0

In my previous RFC I suggested ignoring SOAP+UDDI+WSDL and build DAS
2.0 on top of straight HTTP+XML using a REST architecture.

To show you how that might work, here's one way to have implemented
the functionality from the DAS 1.5 spec.  I ignore for now a
discussion of how to handle versioning when the sequence changes.  (I
think it's best done by having an extra level with the version
identifier in them.)

If you want me to say "URI" instead "URI" you can make the replacement
in your head.

  ============================
<dsn>/
  Returns a list of data sources

This replaces the 'dsns' method call.  It returns an XML document of
doctype "http://www.biodas.org/dtd/dasdsn.dtd" Doing this also gets
rid of the annoying "cannot have a dsn named 'dsn'" problem.


<dsn>/stylesheet
  Returns the stylesheet for the DSN


<dsn>/entry_point/
  Returns a list of entry points

This returns an XML document (the doctype doesn't yet exist).  It is
basically a list of URLs.

<dsn>/entry_point/<id>
  This returns XML describing a segment, ie, id, start, stop, and
orientation.  The doctype doesn't yet exist.


<dsn>/feature/
  Returns a list of all features.  (You might not want to do this,
and the server could simply say "not implemented.")

<dsn>/feature/<id>
  Returns the GFF for the feature named 'id'

Each feature in 1.5 already has a unique identifier.  This makes the
feature a full-fledged citizen of the web by making it directly
accessible.  (Under DAS 1.5 it is accessible as a side effect of a
'features' command, but I don't want to confuse a feature's name with
a search command, especially since many searches can return the same
feature, and because the results of a search should be a list, not a
single result.)


<dsn>/features?segment=RANGE;type=TYPE;category=....
  Returns a list of features matching the given search criteria.

The input is identical to the existing 'features' command.  The result
is a list of feature URLs.  This is a POST interface.


<dsn>/sequence?segment=RANGE[;segment=RANGE]*
  Returns the sequence in the given segment(s), as XML of
doctype "http://www.biodas.org/dtd/dassequence.dtd".

This is identical to the existing 'sequence' command and is a POST
interface.


<dsn>/type/
  Returns a list of all types.  (You might not want to do this,
and the server could simply say "not implemented.")

<dsn>/type/<id>
  Returns a XML document of doctype "DASTYPE", which is like
the existing "http://www.biodas.org/dtd/dastypes.dtd" except
there's only one type.
  
<dsn>/types?segment=RANGE;type=TYPE
  Return a list of URIs for types matching the search criteria.

The input is identical to the existing 'types' command.  The result is
a list of URLs.  This is a POST interface.

  ============================

Unlike the existing spec, and unlike the proposed RFC 13, the feature
and types are objects in their own right.  This has several effects.

  Linkability

Since a feature has a URL, means that features are directly
addressible.  This helps address RFC 3 "InterService links in DAS/2"
(see http://www.biodas.org/RFCs/rfc003.txt ) because each object is
accessible through a URL, and can be addressed by anything else which
understands URLs.

One such relevant technology is the Resource Description Framework
(RDF) (see http://www.w3.org/TR/REC-rdf-syntax/ ).  This lets 3rd
parties add their own associations between URLs.  For example, I could
publish my own RDF database which comments on the quality of features
in someone else's database.

I do not know enough about RDF.  I conjecture that I can suggest an
alternative stylesheet (RFC 8, "DAS Visualization Server"
http://www.biodas.org/RFCs/rfc008.txt) by an appropriate link to the
<dsn>/stylesheet/ .

I further conjecture that RDF appropriately handles group
normalization from RFC 10 (http://www.biodas.org/RFCs/rfc010.txt).

  Ontologies

Web ontologies, like DAML+OIL, are built on top of RDF.  Because types
are also directly accessible, this lets us (or others!) build their
own ontologies on top of the features type.  This addresses RFC 4
"Annotation ontologies for DAS/2" at
http://www.biodas.org/RFCs/rfc004.txt .


  Independent requests

Perhaps the biggest disadvantage to this scheme is that any search
(like 'features') requires an additional 'GET' to get information
about every feature that matched.  If there are 1,000 matches, then
there are 1,000 additional requests.  Compare that to the current
scheme where all the data about the matches is returned in one shot.

I do not believe this should be a problem.  The HTTP/1.1 spec supports
"keep-alive" so that the connection to the server does not need to be
re-established.  A client can feed requests to the server while also
receiving responses from earlier queries, so there shouldn't be a
pause in bandwidth usage while making each request.  In addition, the
overhead for making a request and the extra headers for each
independent response shouldn't require much extra data to be sent.

The performance slowdown should pay for itself quickly once someone
does multiple queries.  Suppose the second query also has 1,000
matches, with 500 matches overlapping with the first query.  Under the
existing DAS 1.5 spec, this means that all the data must be sent
again.  Under this proposal, only the 500 new requests need be sent.

One other issue mentioned in the SOAP proposals and in my REST
advocacy was the ability to stream through a feature table.  Suppose
the feature table is large.  People would like to see partial results
and not wait until all the data is received.  Eg, this would allow
them to cancel a download if they can see it contains the wrong
information.

If the results are sent in one block, this requires that the parsing
toolkit support a streaming interface.  It is unlikely that most SOAP
toolkits will support this mode.  It's also trickier to develop
software using a streaming API (like SAX) compared to a bulk API (like
DOM).  This new spec gets around that problem by sending a list of
URLs instead of the full data.  The individual records are small and
can be fetched one at a time and parsed with whatever means are
appropriate.  This makes it easier to develop software which can
multitask between reading/parsing input and handling the user
interface.
 
  Caching

RFC 5 "DAS Caching" (http://www.biodas.org/RFCs/rfc005.txt) wants a
way to cache data.  I believe most of the data requests will be for
feature data.  Because these are independentially named and accessed
through that name using an HTTP GET, this means that normal HTTP
caching systems like the Squid proxy can be used along with standard
and well-defined mechanisms to control cache behaviour.

The caching proposal also considers P2P systems like Gnutella as a way
to distribute data.  One possible scheme for this is to define a
mapping from URLs to a Gnutella resource.  In this case, replace 'URL'
above to 'URI'.



				Andrew Dalke
				dalke@dalkescientific.com
-- 
Need usable, robust software for bioinformatics or chemical
informatics?  Want to integrate your different tools so you can
do more science in less time?  Contact us!
               http://www.dalkescientific.com/