[DAS] RFC: REST example for DAS 2.0

David Block dblock@gnf.org
Wed, 15 Jan 2003 08:41:42 -0800


Brian,

What libraries are you using for DIME?  Is there good Java, Perl 
support?  I know you're a J2EE shop - what toolkit do you use?

Thanks,
Dave

On Wednesday, January 15, 2003, at 05:41 AM, Brian Gilman wrote:

> On 1/15/03 2:46 AM, "Andrew Dalke" <adalke@mindspring.com> wrote:
>
> Hey Andrew,
>
>     Long time no talk. SOAP, WSDL, and UDDI are NEVER going to help 
> you send
> 50 MB of data across the wire! I've also thought about REST as a means 
> to
> make a distributed system. But, the industry is just not going that 
> way.
> There are MANY toolkits to program up a web service. Programming a REST
> service means doing things that are non-standard and my engineering 
> brain
> says not to touch those things. SOAP has been able to solve a lot of
> interoperability problems and will only get better over time. We use 
> the
> DIME protocol and compression to shove data over the wire. No need to 
> parse
> the document this way.
>
>     SOAP has two methods of asking for data:
>
>     1) RPC
>     2) Document centric
>
>     My question to you is: Why reinvent the wheel?? Why program up yet
> another wire protocol when you have something to work with already?? 
> And,
> DAS, is a REST protocol!! Right now DAS just works. Why change it to 
> use
> anything else?? Is there a problem with the semantics of the protocol 
> that
> impede any of the research that we are doing?? Murphy's law should be 
> called
> the engineer's prayer.
>
>                                     Best,
>
>                                             -B
>
>>                   REST example for DAS 2.0
>>
>> In my previous RFC I suggested ignoring SOAP+UDDI+WSDL and build DAS
>> 2.0 on top of straight HTTP+XML using a REST architecture.
>>
>> To show you how that might work, here's one way to have implemented
>> the functionality from the DAS 1.5 spec.  I ignore for now a
>> discussion of how to handle versioning when the sequence changes.  (I
>> think it's best done by having an extra level with the version
>> identifier in them.)
>>
>> If you want me to say "URI" instead "URI" you can make the replacement
>> in your head.
>>
>> ============================
>> <dsn>/
>> Returns a list of data sources
>>
>> This replaces the 'dsns' method call.  It returns an XML document of
>> doctype "http://www.biodas.org/dtd/dasdsn.dtd" Doing this also gets
>> rid of the annoying "cannot have a dsn named 'dsn'" problem.
>>
>>
>> <dsn>/stylesheet
>> Returns the stylesheet for the DSN
>>
>>
>> <dsn>/entry_point/
>> Returns a list of entry points
>>
>> This returns an XML document (the doctype doesn't yet exist).  It is
>> basically a list of URLs.
>>
>> <dsn>/entry_point/<id>
>> This returns XML describing a segment, ie, id, start, stop, and
>> orientation.  The doctype doesn't yet exist.
>>
>>
>> <dsn>/feature/
>> Returns a list of all features.  (You might not want to do this,
>> and the server could simply say "not implemented.")
>>
>> <dsn>/feature/<id>
>> Returns the GFF for the feature named 'id'
>>
>> Each feature in 1.5 already has a unique identifier.  This makes the
>> feature a full-fledged citizen of the web by making it directly
>> accessible.  (Under DAS 1.5 it is accessible as a side effect of a
>> 'features' command, but I don't want to confuse a feature's name with
>> a search command, especially since many searches can return the same
>> feature, and because the results of a search should be a list, not a
>> single result.)
>>
>>
>> <dsn>/features?segment=RANGE;type=TYPE;category=....
>> Returns a list of features matching the given search criteria.
>>
>> The input is identical to the existing 'features' command.  The result
>> is a list of feature URLs.  This is a POST interface.
>>
>>
>> <dsn>/sequence?segment=RANGE[;segment=RANGE]*
>> Returns the sequence in the given segment(s), as XML of
>> doctype "http://www.biodas.org/dtd/dassequence.dtd".
>>
>> This is identical to the existing 'sequence' command and is a POST
>> interface.
>>
>>
>> <dsn>/type/
>> Returns a list of all types.  (You might not want to do this,
>> and the server could simply say "not implemented.")
>>
>> <dsn>/type/<id>
>> Returns a XML document of doctype "DASTYPE", which is like
>> the existing "http://www.biodas.org/dtd/dastypes.dtd" except
>> there's only one type.
>>
>> <dsn>/types?segment=RANGE;type=TYPE
>> Return a list of URIs for types matching the search criteria.
>>
>> The input is identical to the existing 'types' command.  The result is
>> a list of URLs.  This is a POST interface.
>>
>> ============================
>>
>> Unlike the existing spec, and unlike the proposed RFC 13, the feature
>> and types are objects in their own right.  This has several effects.
>>
>> Linkability
>>
>> Since a feature has a URL, means that features are directly
>> addressible.  This helps address RFC 3 "InterService links in DAS/2"
>> (see http://www.biodas.org/RFCs/rfc003.txt ) because each object is
>> accessible through a URL, and can be addressed by anything else which
>> understands URLs.
>>
>> One such relevant technology is the Resource Description Framework
>> (RDF) (see http://www.w3.org/TR/REC-rdf-syntax/ ).  This lets 3rd
>> parties add their own associations between URLs.  For example, I could
>> publish my own RDF database which comments on the quality of features
>> in someone else's database.
>>
>> I do not know enough about RDF.  I conjecture that I can suggest an
>> alternative stylesheet (RFC 8, "DAS Visualization Server"
>> http://www.biodas.org/RFCs/rfc008.txt) by an appropriate link to the
>> <dsn>/stylesheet/ .
>>
>> I further conjecture that RDF appropriately handles group
>> normalization from RFC 10 (http://www.biodas.org/RFCs/rfc010.txt).
>>
>> Ontologies
>>
>> Web ontologies, like DAML+OIL, are built on top of RDF.  Because types
>> are also directly accessible, this lets us (or others!) build their
>> own ontologies on top of the features type.  This addresses RFC 4
>> "Annotation ontologies for DAS/2" at
>> http://www.biodas.org/RFCs/rfc004.txt .
>>
>>
>> Independent requests
>>
>> Perhaps the biggest disadvantage to this scheme is that any search
>> (like 'features') requires an additional 'GET' to get information
>> about every feature that matched.  If there are 1,000 matches, then
>> there are 1,000 additional requests.  Compare that to the current
>> scheme where all the data about the matches is returned in one shot.
>>
>> I do not believe this should be a problem.  The HTTP/1.1 spec supports
>> "keep-alive" so that the connection to the server does not need to be
>> re-established.  A client can feed requests to the server while also
>> receiving responses from earlier queries, so there shouldn't be a
>> pause in bandwidth usage while making each request.  In addition, the
>> overhead for making a request and the extra headers for each
>> independent response shouldn't require much extra data to be sent.
>>
>> The performance slowdown should pay for itself quickly once someone
>> does multiple queries.  Suppose the second query also has 1,000
>> matches, with 500 matches overlapping with the first query.  Under the
>> existing DAS 1.5 spec, this means that all the data must be sent
>> again.  Under this proposal, only the 500 new requests need be sent.
>>
>> One other issue mentioned in the SOAP proposals and in my REST
>> advocacy was the ability to stream through a feature table.  Suppose
>> the feature table is large.  People would like to see partial results
>> and not wait until all the data is received.  Eg, this would allow
>> them to cancel a download if they can see it contains the wrong
>> information.
>>
>> If the results are sent in one block, this requires that the parsing
>> toolkit support a streaming interface.  It is unlikely that most SOAP
>> toolkits will support this mode.  It's also trickier to develop
>> software using a streaming API (like SAX) compared to a bulk API (like
>> DOM).  This new spec gets around that problem by sending a list of
>> URLs instead of the full data.  The individual records are small and
>> can be fetched one at a time and parsed with whatever means are
>> appropriate.  This makes it easier to develop software which can
>> multitask between reading/parsing input and handling the user
>> interface.
>>
>> Caching
>>
>> RFC 5 "DAS Caching" (http://www.biodas.org/RFCs/rfc005.txt) wants a
>> way to cache data.  I believe most of the data requests will be for
>> feature data.  Because these are independentially named and accessed
>> through that name using an HTTP GET, this means that normal HTTP
>> caching systems like the Squid proxy can be used along with standard
>> and well-defined mechanisms to control cache behaviour.
>>
>> The caching proposal also considers P2P systems like Gnutella as a way
>> to distribute data.  One possible scheme for this is to define a
>> mapping from URLs to a Gnutella resource.  In this case, replace 'URL'
>> above to 'URI'.
>>
>>
>>
>> Andrew Dalke
>> dalke@dalkescientific.com
>
> -- 
> Brian Gilman <gilmanb@genome.wi.mit.edu>
> Group Leader Medical & Population Genetics Dept.
> MIT/Whitehead Inst. Center for Genome Research
> One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
> phone +1 617  252 1069 / fax +1 617 252 1902
>
>
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das
>
--

----------------------------------------------
David Block -- Genome Informatics Developer
dblock@gnf.org
http://radio.weblogs.com/0104507
(858)812-1513