[DAS] RFC: REST advocacy

Andrew Dalke dalke@dalkescientific.com
Wed, 15 Jan 2003 00:45:52 -0700


                       REST Advocacy

While others have suggested using SOAP, UDDI, WSDL, etc for DAS 2.0,
  RFC  0: http://www.biojava.org/thomasd/DAS/spec-new.html
  RFC  2: http://www.biodas.org/RFCs/rfc002.txt
  RFC 11: http://www.biodas.org/RFCs/rfc011.txt
  RFC 13: http://www.biodas.org/RFCs/king_das2/index.html

I propose herein an alternate view.

I had to use SOAP for a project for one of my clients, a Big Pharma.
Getting all the parts to work together was a bear, especially
once we wanted to get WSDL in place.  I tried to read the spec and
available examples, but it wasn't much help.  Nor was the O'Reilly
book on SOAP.  We finally got it working, but it felt more like
luckly guesswork than true understanding.

Thea public web diary from a friend of mine, Andrew Kuchling, helps
describe my current view:

http://www.amk.ca/diary/2002/sep.html
> Indeed. The best thing to do, for a small developer, is to lend
> support to those standards that are simple enough to be
> understandable. Use RELAX NG instead of the wretchedly overcomplicated
> XML Schema. Use XML-RPC instead of SOAP, and if you need a more
> complex interface than XML-RPC can handle, skip SOAP and design it
> following the REST principles.
>
> Eventually all of this SOAP + WSDL + UDDI + <Lord knows what else>
> junk will collapse under its own weight, I think. I just hope it won't
> take the underlying technology of XML with it.

For another example, by Fredrick Lundh, who wrote one of the SOAP
libraries available for Python:

http://effbot.org/zone/rest-vs-rpc.htm
> So when we started working on the design for a large image
> distribution and processing system, we already had a simple and
> scalable design, and the tools to support it. Just send XML
> documents representing objects back and forth over HTTP, and use the
> lightweight DOM structure to hold parsed versions of them inside the
> application. Add some glue code to let application code access the
> DOM structures as ordinary Python objects, and you have a complete
> and scalable system.
> 
> The result was a much nicer specification (very few buzzwords) that
> anyone can understand, far less code, and most importantly, a much
> more robust design.
   ...
> XML-RPC gives you a lot of power, and anyone can understand how it
> works, and understand what the limitations are (Dave W. might not
> know the limitations, but that's another story ;-)
> 
> SOAP is something completely different; lots of additional
> complexity, but very few additional benefits. Some people love
> complexity (especially if they see a chance to make a living out of
> it, like Don Box). But I don't. Wouldn't use Python if I did.

"REST" is a way of organizing web services around URIs and other
web technologies instead of using RPC systems like SOAP.  Paul Prescod
is one advocate for the REST architecture and some of his essays are
available at http://www.prescod.net/rest/ . He wrote an overview of
REST at http://www.xml.com/pub/a/2002/02/20/rest.html .

There is a REST wiki at http://conveyor.com/RESTwiki/moin.cgi .
See also http://www.xfront.com/REST-Web-Services.html .

One fundamental idea promoted by REST advocates is that HTTP is not
simply a way to get bits from here to there but instead is an
application protocol, with the methods GET, POST, PUT, and DELETE,
just like a file system has a few fundamental actions.  (Some
applications may need a few more actions, which is the idea behind
DAV, see http://www.webdav.org/ .  DAV is a REST architecture that
extends HTTP/1.1 to add support for metadata properties, locking, and
namespaces.)


Let's consider RFC 11, "SOAP as the standard transport encapsulation
for DAS/2 messages."  It starts by giving some background to SOAP.

> SOAP [1,2] is a simple messaging system, whereby all messages are
> encoded as XML documents.  It supports a variety of messaging
> models, and is independent of underlying transport protocol, but for
> DAS, we will presumably be using a standard request-response
> paradigm.  At least initially, transport will be over HTTP or HTTPS.

I can say for certain that SOAP is not simple, at least not 1.2. 
Try reading the spec for it
   Part 0: Primer   http://www.w3.org/TR/soap12-part0/
   Part 1: Messaging Framework   http://www.w3.org/TR/soap12-part1/
   Part 3: Adjuncts   http://www.w3.org/TR/soap12-part2/
with support for things like routing and signing of different parts
of the message stream.   Follow that up with the specs for WSDL,
UDDI, and XML Schemas.  Blarg!

I'm not sure what is meant by "messaging models."

As mentioned above, HTTP is not simply a transport protocol.  It
offers other actions besides "send data then get data back."  It can
be used as a transport protocol, but then again, so can SOAP.

Is there need to use any "transport protocol" other than HTTP/HTTPS?
If not, then is that an important consideration?

The RFC then lists various advantages of SOAP over the existing DAS
1.5 protocol.

>  - Unlike the current DAS model, requests will be XML encoded, as
>    well as responses.  This gives much more scope for extending the
>    request format, and makes it easier to support a powerful query
>    language in the requests (indeed, it would be easy to embed
>    XQueryX in SOAP messages).

The current request format is the CGI-style
"application/x-www-form-urlencoded" or "multipart/form-data" format.
This is not an intrinsic part of HTTP.  After all, that's how SOAP and
XML-RPC can send XML to the server, or how HTTP allows a "PUT" request.

Therefore, I argue that this is not an aspect of SOAP but is simply
one of HTTP.

The support for XQueryX should not bias the choice towards XQueryX.
It's just as easy to embed any text string in any message and don't
think XQueryX is .... pretty.  See the example on the XQueryX page at
http://www.w3.org/TR/xqueryx


>  - Message components must be namespace-qualified, guaranteeing
>    extensibility.

The ability to return XML is a property of HTTP and not specifically
of SOAP.  Nor is extensibility unique to XML, though I do not advocate
some other data representation language.

>  - Basic exception-reporting semantics are defined.

As are they for HTTP.  See
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

Indeed, since SOAP is a layer on top of HTTP that means I already need
to check for HTTP errors.  Why then require two sources for error
messages?

>  - There is full support for pipelines of actors processing a given
>    message.  This makes technologies like smart caching and proxying
>    easy to retrofit onto protocols.

Even though I've seen the term "actor" used elsewhere, I don't
understand that term, nor "pipelines of actors."

Smart caching and proxying is already available using HTTP.  See 
  http://www.prescod.net/rest/rest_vs_soap_overview/

] The final core goal of REST is "compatibility with
] intermediaries". The most popular intermediaries are various kinds
] of Web proxies. Some proxies cache information to improve
] performance. Others enforce security policies. Another important
] kind of intermediary is a gateway, which encapsulates non-Web
] systems.

Compare that to a SOAP request where a proxy cannot automatically tell
if an RPC call is a pure read-request (like a GET) or not.  Hence, it
cannot easily tell if it's cacheable or not!


>  - There are a large (and increasing) number of toolkits which make
>    developing SOAP applications easy.

And a large number for doing XML-RPC.  And a much larger number for
doing HTTP.

The two packages I used for SOAP and Python weren't too hard, it's
true.

>  - SOAP-Encoding provides a standard format for marshaling arbitrary
>    data structures. (but see below for issues with this).

There are a huge number of ways to marshal arbitrary data structures,
even standardized ways.  In addition, (as I understand things) SOAP
Encoding doesn't require the rest of SOAP, like envelopes, routing,
etc.

The part "below" says that the SOAP toolkits are DOM-based and the RFC
author would rather use a SAX/event-based one for processing large
data sets.  (Also mentioned in RFC 0.)  However, as pointed out by
Richard Salz (developer of ZSI, a leading Python SOAP implementation,
and from his bio a long-time network protocol developer)

http://www.xml.com/pub/a/2002/07/17/salz.html?page=last
} Note that even though the individual processing is fairly simple,
} the overall process is fairly complex and requires multiple passes
} over the header elements. In a streaming environment -- think SAX,
} not DOM -- that won't work. In fact, it's my bet that headers will
} spell the end of SAX-style SOAP processors. For example, a digital
} signature of a SOAP message naturally belongs in the header. In
} order to generate the signature, you need to generate a hash of the
} message content. How can you do that without buffering?

Hence I believe SOAP toolkits will not migrate to a pure-event driven
style and his concern will not be addressed.

In any case, the RFC says the concern is with the memory and
set-up/tear-down costs.  I disagree.  If 50 MB of data was requested,
then using, say, a 100MB DOM shouldn't be a concern.  I say the
concern should be to allow people to see the data while it is being
downloaded, instead of having to wait for a complete download.  I do
not think the existing SOAP toolkits allow this, nor will they soon.

For one final read, only somewhat related, see
 http://www.adtmag.com/article.asp?id=6965
which advocates the "bohemian" RELAX NG schema over XML Schema.



				Andrew Dalke
				dalke@dalkescientific.com
-- 
Need usable, robust software for bioinformatics or chemical
informatics?  Want to integrate your different tools so you can
do more science in less time?  Contact us!
               http://www.dalkescientific.com/