[DAS] RFC: REST example for DAS 2.0

Brian Gilman gilmanb@genome.wi.mit.edu
Wed, 15 Jan 2003 08:41:33 -0500


On 1/15/03 2:46 AM, "Andrew Dalke" <adalke@mindspring.com> wrote:

Hey Andrew,

    Long time no talk. SOAP, WSDL, and UDDI are NEVER going to help you send
50 MB of data across the wire! I've also thought about REST as a means to
make a distributed system. But, the industry is just not going that way.
There are MANY toolkits to program up a web service. Programming a REST
service means doing things that are non-standard and my engineering brain
says not to touch those things. SOAP has been able to solve a lot of
interoperability problems and will only get better over time. We use the
DIME protocol and compression to shove data over the wire. No need to parse
the document this way.

    SOAP has two methods of asking for data:
        
    1) RPC
    2) Document centric

    My question to you is: Why reinvent the wheel?? Why program up yet
another wire protocol when you have something to work with already?? And,
DAS, is a REST protocol!! Right now DAS just works. Why change it to use
anything else?? Is there a problem with the semantics of the protocol that
impede any of the research that we are doing?? Murphy's law should be called
the engineer's prayer.

                                    Best,

                                            -B

>                   REST example for DAS 2.0
> 
> In my previous RFC I suggested ignoring SOAP+UDDI+WSDL and build DAS
> 2.0 on top of straight HTTP+XML using a REST architecture.
> 
> To show you how that might work, here's one way to have implemented
> the functionality from the DAS 1.5 spec.  I ignore for now a
> discussion of how to handle versioning when the sequence changes.  (I
> think it's best done by having an extra level with the version
> identifier in them.)
> 
> If you want me to say "URI" instead "URI" you can make the replacement
> in your head.
> 
> ============================
> <dsn>/
> Returns a list of data sources
> 
> This replaces the 'dsns' method call.  It returns an XML document of
> doctype "http://www.biodas.org/dtd/dasdsn.dtd" Doing this also gets
> rid of the annoying "cannot have a dsn named 'dsn'" problem.
> 
> 
> <dsn>/stylesheet
> Returns the stylesheet for the DSN
> 
> 
> <dsn>/entry_point/
> Returns a list of entry points
> 
> This returns an XML document (the doctype doesn't yet exist).  It is
> basically a list of URLs.
> 
> <dsn>/entry_point/<id>
> This returns XML describing a segment, ie, id, start, stop, and
> orientation.  The doctype doesn't yet exist.
> 
> 
> <dsn>/feature/
> Returns a list of all features.  (You might not want to do this,
> and the server could simply say "not implemented.")
> 
> <dsn>/feature/<id>
> Returns the GFF for the feature named 'id'
> 
> Each feature in 1.5 already has a unique identifier.  This makes the
> feature a full-fledged citizen of the web by making it directly
> accessible.  (Under DAS 1.5 it is accessible as a side effect of a
> 'features' command, but I don't want to confuse a feature's name with
> a search command, especially since many searches can return the same
> feature, and because the results of a search should be a list, not a
> single result.)
> 
> 
> <dsn>/features?segment=RANGE;type=TYPE;category=....
> Returns a list of features matching the given search criteria.
> 
> The input is identical to the existing 'features' command.  The result
> is a list of feature URLs.  This is a POST interface.
> 
> 
> <dsn>/sequence?segment=RANGE[;segment=RANGE]*
> Returns the sequence in the given segment(s), as XML of
> doctype "http://www.biodas.org/dtd/dassequence.dtd".
> 
> This is identical to the existing 'sequence' command and is a POST
> interface.
> 
> 
> <dsn>/type/
> Returns a list of all types.  (You might not want to do this,
> and the server could simply say "not implemented.")
> 
> <dsn>/type/<id>
> Returns a XML document of doctype "DASTYPE", which is like
> the existing "http://www.biodas.org/dtd/dastypes.dtd" except
> there's only one type.
> 
> <dsn>/types?segment=RANGE;type=TYPE
> Return a list of URIs for types matching the search criteria.
> 
> The input is identical to the existing 'types' command.  The result is
> a list of URLs.  This is a POST interface.
> 
> ============================
> 
> Unlike the existing spec, and unlike the proposed RFC 13, the feature
> and types are objects in their own right.  This has several effects.
> 
> Linkability
> 
> Since a feature has a URL, means that features are directly
> addressible.  This helps address RFC 3 "InterService links in DAS/2"
> (see http://www.biodas.org/RFCs/rfc003.txt ) because each object is
> accessible through a URL, and can be addressed by anything else which
> understands URLs.
> 
> One such relevant technology is the Resource Description Framework
> (RDF) (see http://www.w3.org/TR/REC-rdf-syntax/ ).  This lets 3rd
> parties add their own associations between URLs.  For example, I could
> publish my own RDF database which comments on the quality of features
> in someone else's database.
> 
> I do not know enough about RDF.  I conjecture that I can suggest an
> alternative stylesheet (RFC 8, "DAS Visualization Server"
> http://www.biodas.org/RFCs/rfc008.txt) by an appropriate link to the
> <dsn>/stylesheet/ .
> 
> I further conjecture that RDF appropriately handles group
> normalization from RFC 10 (http://www.biodas.org/RFCs/rfc010.txt).
> 
> Ontologies
> 
> Web ontologies, like DAML+OIL, are built on top of RDF.  Because types
> are also directly accessible, this lets us (or others!) build their
> own ontologies on top of the features type.  This addresses RFC 4
> "Annotation ontologies for DAS/2" at
> http://www.biodas.org/RFCs/rfc004.txt .
> 
> 
> Independent requests
> 
> Perhaps the biggest disadvantage to this scheme is that any search
> (like 'features') requires an additional 'GET' to get information
> about every feature that matched.  If there are 1,000 matches, then
> there are 1,000 additional requests.  Compare that to the current
> scheme where all the data about the matches is returned in one shot.
> 
> I do not believe this should be a problem.  The HTTP/1.1 spec supports
> "keep-alive" so that the connection to the server does not need to be
> re-established.  A client can feed requests to the server while also
> receiving responses from earlier queries, so there shouldn't be a
> pause in bandwidth usage while making each request.  In addition, the
> overhead for making a request and the extra headers for each
> independent response shouldn't require much extra data to be sent.
> 
> The performance slowdown should pay for itself quickly once someone
> does multiple queries.  Suppose the second query also has 1,000
> matches, with 500 matches overlapping with the first query.  Under the
> existing DAS 1.5 spec, this means that all the data must be sent
> again.  Under this proposal, only the 500 new requests need be sent.
> 
> One other issue mentioned in the SOAP proposals and in my REST
> advocacy was the ability to stream through a feature table.  Suppose
> the feature table is large.  People would like to see partial results
> and not wait until all the data is received.  Eg, this would allow
> them to cancel a download if they can see it contains the wrong
> information.
> 
> If the results are sent in one block, this requires that the parsing
> toolkit support a streaming interface.  It is unlikely that most SOAP
> toolkits will support this mode.  It's also trickier to develop
> software using a streaming API (like SAX) compared to a bulk API (like
> DOM).  This new spec gets around that problem by sending a list of
> URLs instead of the full data.  The individual records are small and
> can be fetched one at a time and parsed with whatever means are
> appropriate.  This makes it easier to develop software which can
> multitask between reading/parsing input and handling the user
> interface.
> 
> Caching
> 
> RFC 5 "DAS Caching" (http://www.biodas.org/RFCs/rfc005.txt) wants a
> way to cache data.  I believe most of the data requests will be for
> feature data.  Because these are independentially named and accessed
> through that name using an HTTP GET, this means that normal HTTP
> caching systems like the Squid proxy can be used along with standard
> and well-defined mechanisms to control cache behaviour.
> 
> The caching proposal also considers P2P systems like Gnutella as a way
> to distribute data.  One possible scheme for this is to define a
> mapping from URLs to a Gnutella resource.  In this case, replace 'URL'
> above to 'URI'.
> 
> 
> 
> Andrew Dalke
> dalke@dalkescientific.com

-- 
Brian Gilman <gilmanb@genome.wi.mit.edu>
Group Leader Medical & Population Genetics Dept.
MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1069 / fax +1 617 252 1902