[MOBY-l] MOBY and the "REST/SOAP" debate

Andrew D. Farmer adf at ncgr.org
Thu Oct 3 19:32:13 UTC 2002


Hi all-

I wanted to try to get some discussion going here with respect to some
interesting debates that have been going on in various places around the
basic theme of web services. Here, I'll try to provide an overview of the
issues (based on my own limited understanding of them at this point), and
give some links to useful starting points for further details. Despite
its being at times maddeningly subtle and hair-splitting, I do think there are
points raised by the debate that are quite relevant to MOBY, especially in
terms of our choice of the best technologies to fit our needs. I know Mark
has already been quite skeptical of UDDI, and it's interesting to see that
many others are voicing concerns about some of the other components of the
"standard" web services approach (or indeed, questioning whether in fact
the whole idea of "web services" should be thought of as distinct from
"the web", and whether we need these new standards, or to re-think our
use of the old standards).

My thoughts on the subject are still rather nebulous, but I have the sense
that a lot of the fundamental issues are very much related
to those involved in the different approaches to service description that are
embodied in the "static service" vs.  "dynamic service" approaches which I
have described with respect to ISYS.

The basic focus of the debate seems to center on the SOAP protocol, and the
question of how it relates to the basic design of the web architecture.
The criticisms of SOAP largely stem from a view that it represents
"yet another" CORBA/DCOM/RMI/RPC architecture, that is (ab)using HTTP as a
convenient way of being firewall-friendly and universally available.
Their criticisms about being a new RPC mechanism run deeper than the
"heaviness" or lack of openness of the older approaches, which most folks
seem to agree will be alleviated by the use of XML-derived standards.
Instead, they seem to claim that the very nature of an RPC-based approach is
at odds with the fundamental architectural principles of the web that made it
"successful"- or at least enabled it to have a more significant impact than
any distributed processing model has ever done... To quote Paul Prescod
(from "Second Generation Web Services" at
http://www.xml.com/pub/a/2002/02/06/rest.html):
"These technologies achieved only limited success before they adapted for the Web.
Some believe that the problem was that Microsoft and the OMG supporters could not
get along. I disagree. There is a deeper issue. RPC models are great for closed-world
problems. A closed world problem is one where you know all of the users, you can share
a data model with them, and you can all communicate directly as to your needs.
Evolution is comparatively easy in such an environment: you just tell everybody that the
RPC API is going to change on such and such a date and perhaps you have some
changeover period to avoid downtime. When you want to integrate a new system you do
so by building a point-to-point integration.

On the other hand, when your user base is too large to communicate coherently you
need a different strategy. You need a pre-arranged framework that allows for evolution
on both the client and server sides. You need to depend less on a shared, global
understanding of the rights and responsibilities of a participant. You need to put in hooks
where compliant clients and serves can innovate without contacting you. You need to
leave in explicit mechanisms for interoperating with systems that do not have the same
API. RPC protocols are usually poorly suited for this kind of evolution. Changing
interfaces tends to be extremely difficult. Integrating services typically takes
complicated software "glue".

I believe this is the reason no enterprise has ever successfully unified all of their systems with DCOM, CORBA, or RMI.

Now we come to the crux of the problem: SOAP RPC is DCOM for the Internet...."

Now, while the rhetoric along these lines tends to get a bit thick, I think
they have some good insights into some fundamental (and non-obvious)
differences between this paradigm and alternative approaches to the same
"web services" problems that are more consistent with the architecture of
the web (and are therefore claimed to be more ripe for the same explosive
and innovative growth seen by the web).

The SOAP critics rally around yet another acronym: REST. Unlike the myriad
of other acronyms floating around "web services", however, REST is not
a proposed standard, but rather an "architectural style". It stands for
REpresentational State Transfer, and was coined in the PhD dissertation of
Roy Fielding (co-founder and director or the Apache Software Foundation, and
one of the leading lights of the W3C- for example, he coauthored the HTTP
specs). The meaning of this phrase is probably not worth explaining here
(see the references if you're really interested in getting a view of web
applications as finite state machines), but its "disciples" argue that it
represents the cornerstones of the web architecture, and that the main reason
for the web's "success" lie in its embodiment of REST principles. So, what are
these principles? The goals of the REST "style" are stated as:

	-scalability of component interactions
	-generality of interfaces
	-independent deployment of components
	-intermediary components to reduce latency, enforce security and
		encapsulate legacy systems

Phrased in this way, it may or may not sound like an obvious fit for our
situation, but my take is that the essential theme here is a system that is as
decentralized as possible (in the sense of having an authority for prescribing
how components do their business), while at the same time allowing interesting
and unforeseen interactions to evolve between components, and making it
possible for infrastructural components (caches, proxies, gateways) to do their
job without needing to know the details of every component.

It is claimed that the web architecture achieves these goals
by relying on these "core components":

	-The URI as a universal addressing scheme for resources.
	-HTTP as an ultra-generic stateless protocol for accessing and manipulating
		resources.
	-Representation of resources as self-descriptive and linked
		hypertext (mostly HTML now, but transitioning to XML is perceived
		as fundamentally necessary)

Another way it is often articulated is that the "web" has been designed as
a way of representing data-centric "resources" (anything designated by a URI,
and ideally dereferenceable to give back some representation of that resource, e.g.
an XML document). The set of "operations" on those "resources" has been designed
to be extremely limited- the 4 basic HTTP methods, GET, POST, PUT, DELETE are
seen to correspond to the basic generic operations on a piece of data:
retrieve, update, create, destory. The idea seems to be that by focusing on the
data (and developing rich XML vocabularies for representing it), and keeping
the "operations" restricted to basic data manipulation operations, one has a
much better chance of enabling integration/interoperation in a wildly
decentralized, "open world" situation, than by trying to get people to agree on
proper "behavioral" semantics.
(Note that by embedding transitions to other resources into the
representations given back, the creator of the resource is essentially
defining a sort of operation set into the document.)

The problem they see with SOAP as I understand it is that it
is essentially a "framework" for developing application-specific
protocols that live in their own shadowy
world, invisible to components that are outside the agreed-upon standards
of the "subprotocol". The classic example seems to be the use of a
SOAP-RPC call for the equivalent of an HTTP "GET". Using the SOAP approach,
one would define some "special message" (getSequence) which took an identifier
parameter (in the namespace of the service provider) and returned an XML
document describing the data. Some of the "ill-effects" of this seemingly
straightforward approach are:

	Since there is no URI for the returned document, it effectively
	does not belong to the web. This implies that you can't use it
	in webby ways, e.g.: bookmark it; create links to it in documents;
	use it in URI-based standards such as RDF, XLink, XInclude, etc.;
	In effect, only SOAP-enabled technologies can get at the data, and
	further than that, only those who have through some process learned
	about your custom SOAP messaging semantics. Furthermore, it's not
	clear how you would "guide" consumers of the data to
	related information (e.g. the annotations of the sequence) accessible
	via other SOAP messages by encoding these "related SOAP messages"
	into the returned XML data, as you could by including URI links
	to the related info in your document.  The only way to "discover" the
	existence of the data is via the rather complicated journey through
	UDDI/WSDL/SOAP or some point-to-point agreement between requester
	provider.

	The infrastructural components of the web have no insight into
	the SOAP messaging semantics. They only know it is an HTTP POST
	to a certain URL. Thus, even though it's "really only a GET",
	it won't be seen that way to caches and proxies.

Well, I'm probably not doing a very good job at articulating the arguments
(especially given my own naive understanding of "the web", I'm hoping more
experienced webheads will weigh in!);
perhaps I'll just refer you to the resources at the end of this letter.

I should note however that it seems important to distinguish between
"the REST approach" and "the way any given website happens to work"- just as
they claim that SOAP uses HTTP without being RESTful, so too you'll find a
lot of discussion of points that are clearly not enforced or observed by
most websites.

Finally, I just wanted to throw out a few thoughts on how this whole business
relates to MOBY.

First off, one of the key points in the discussion is that there is general
agreement that no matter what approach you take, you're going to have to
develop a common "data" language, so it seems reasonable to pursue that
thread regardless of how we proceed with respect to the other issues
of how best to represent "services" on the data.

Second, in keeping with the general themes of "keep it simple", and "keep
the entry-bar low" and a general approach of incremental evolution from
how service providers are doing things now, it seems quite appealing to
be able to think about developing a system around the core concepts of
the web that everyone has basically already bought into, rather than
introducing new-fangled, relatively untried and still evolving technologies.

Finally, the basic tenets of the REST approach seem to lean towards an
extreme skepticism about the extent to which you can (or should even
try to) get consensus on "operational" sorts of issues; on the other hand,
it is acknowledged (by some REST proponents) that if this agreement
can be reached by having some reasonably like-minded community, it can
be effective, and that it is more straightforward for "traditional desktop
programmers" (as opposed to "network programmers") to think in these terms.
This same distinction seems very much related to the whole question
of "static vs dynamic services" that we've been talking about (i.e.
static service representation creating a very RPC-like representation,
vs. dynamic service representation being a more ultra-generic and
encapsulated, though self-descriptive approach). Just as I've been
trying to explore the idea of how these two approaches might be
connected, I think it's worth thinking about whether there is a similar
way of bridging the gap embodied by the RPC interface vs resource-centric
views of SOAP and REST.

If you made it this far, thanks for your patience; I wish I could have
organized my own thoughts a little better, but it seemed appropriate
to get this out on the table for discussion...




Various references:

Some good starting points:
reasonably high-level articles:
http://www.xml.com/pub/a/2002/02/06/rest.html
http://www.xml.com/pub/a/2002/02/20/rest.html
http://www.xml.com/pub/a/2002/07/10/rest.html
http://www.xml.com/pub/a/2001/10/03/webservices.html
http://www.xml.com/pub/a/2002/05/08/deviant.html



Sites with plenty of info:
Many essays by one of the main REST proponents:
http://www.prescod.net/rest/
in particular,
http://www.prescod.net/rest/standardization.html
http://www.prescod.net/rest/rest_vs_soap_overview/


"Wiki pages":
http://conveyor.com/RESTwiki/moin.cgi

Some REST "tutorials" and skeptical views on migrating to web services:
http://www.xfront.com


Some dedicated discussion lists:
http://groups.yahoo.com/group/rest-discuss/
http://groups.yahoo.com/group/rest-explore/

Andrew Farmer
adf at ncgr.org
(505) 995-4464
Database Administrator/Software Developer
National Center for Genome Resources
-------------- next part --------------
Hi all-

I wanted to try to get some discussion going here with respect to some 
interesting debates that have been going on in various places around the 
basic theme of web services. Here, I'll try to provide an overview of the 
issues (based on my own limited understanding of them at this point), and 
give some links to useful starting points for further details. Despite 
its being at times maddeningly subtle and hair-splitting, I do think there are 
points raised by the debate that are quite relevant to MOBY, especially in 
terms of our choice of the best technologies to fit our needs. I know Mark 
has already been quite skeptical of UDDI, and it's interesting to see that 
many others are voicing concerns about some of the other components of the 
"standard" web services approach (or indeed, questioning whether in fact
the whole idea of "web services" should be thought of as distinct from
"the web", and whether we need these new standards, or to re-think our
use of the old standards). 

My thoughts on the subject are still rather nebulous, but I have the sense 
that a lot of the fundamental issues are very much related 
to those involved in the different approaches to service description that are 
embodied in the "static service" vs.  "dynamic service" approaches which I 
have described with respect to ISYS. 

The basic focus of the debate seems to center on the SOAP protocol, and the
question of how it relates to the basic design of the web architecture.
The criticisms of SOAP largely stem from a view that it represents 
"yet another" CORBA/DCOM/RMI/RPC architecture, that is (ab)using HTTP as a 
convenient way of being firewall-friendly and universally available. 
Their criticisms about being a new RPC mechanism run deeper than the 
"heaviness" or lack of openness of the older approaches, which most folks 
seem to agree will be alleviated by the use of XML-derived standards. 
Instead, they seem to claim that the very nature of an RPC-based approach is 
at odds with the fundamental architectural principles of the web that made it 
"successful"- or at least enabled it to have a more significant impact than
any distributed processing model has ever done... To quote Paul Prescod
(from "Second Generation Web Services" at
http://www.xml.com/pub/a/2002/02/06/rest.html):
"These technologies achieved only limited success before they adapted for the Web.
Some believe that the problem was that Microsoft and the OMG supporters could not
get along. I disagree. There is a deeper issue. RPC models are great for closed-world
problems. A closed world problem is one where you know all of the users, you can share
a data model with them, and you can all communicate directly as to your needs.
Evolution is comparatively easy in such an environment: you just tell everybody that the
RPC API is going to change on such and such a date and perhaps you have some
changeover period to avoid downtime. When you want to integrate a new system you do
so by building a point-to-point integration.

On the other hand, when your user base is too large to communicate coherently you
need a different strategy. You need a pre-arranged framework that allows for evolution
on both the client and server sides. You need to depend less on a shared, global
understanding of the rights and responsibilities of a participant. You need to put in hooks
where compliant clients and serves can innovate without contacting you. You need to
leave in explicit mechanisms for interoperating with systems that do not have the same
API. RPC protocols are usually poorly suited for this kind of evolution. Changing
interfaces tends to be extremely difficult. Integrating services typically takes
complicated software "glue".

I believe this is the reason no enterprise has ever successfully unified all of their systems with DCOM, CORBA, or RMI.

Now we come to the crux of the problem: SOAP RPC is DCOM for the Internet...."

Now, while the rhetoric along these lines tends to get a bit thick, I think 
they have some good insights into some fundamental (and non-obvious) 
differences between this paradigm and alternative approaches to the same
"web services" problems that are more consistent with the architecture of
the web (and are therefore claimed to be more ripe for the same explosive
and innovative growth seen by the web).

The SOAP critics rally around yet another acronym: REST. Unlike the myriad
of other acronyms floating around "web services", however, REST is not 
a proposed standard, but rather an "architectural style". It stands for
REpresentational State Transfer, and was coined in the PhD dissertation of
Roy Fielding (co-founder and director or the Apache Software Foundation, and
one of the leading lights of the W3C- for example, he coauthored the HTTP
specs). The meaning of this phrase is probably not worth explaining here
(see the references if you're really interested in getting a view of web 
applications as finite state machines), but its "disciples" argue that it
represents the cornerstones of the web architecture, and that the main reason
for the web's "success" lie in its embodiment of REST principles. So, what are
these principles? The goals of the REST "style" are stated as:
	
	-scalability of component interactions
	-generality of interfaces
	-independent deployment of components
	-intermediary components to reduce latency, enforce security and
		encapsulate legacy systems

Phrased in this way, it may or may not sound like an obvious fit for our 
situation, but my take is that the essential theme here is a system that is as 
decentralized as possible (in the sense of having an authority for prescribing 
how components do their business), while at the same time allowing interesting
and unforeseen interactions to evolve between components, and making it
possible for infrastructural components (caches, proxies, gateways) to do their
job without needing to know the details of every component.

It is claimed that the web architecture achieves these goals
by relying on these "core components": 

	-The URI as a universal addressing scheme for resources.
	-HTTP as an ultra-generic stateless protocol for accessing and manipulating
		resources.
	-Representation of resources as self-descriptive and linked
		hypertext (mostly HTML now, but transitioning to XML is perceived
		as fundamentally necessary)

Another way it is often articulated is that the "web" has been designed as
a way of representing data-centric "resources" (anything designated by a URI,
and ideally dereferenceable to give back some representation of that resource, e.g.
an XML document). The set of "operations" on those "resources" has been designed
to be extremely limited- the 4 basic HTTP methods, GET, POST, PUT, DELETE are
seen to correspond to the basic generic operations on a piece of data:
retrieve, update, create, destory. The idea seems to be that by focusing on the
data (and developing rich XML vocabularies for representing it), and keeping
the "operations" restricted to basic data manipulation operations, one has a
much better chance of enabling integration/interoperation in a wildly 
decentralized, "open world" situation, than by trying to get people to agree on
proper "behavioral" semantics.
(Note that by embedding transitions to other resources into the 
representations given back, the creator of the resource is essentially 
defining a sort of operation set into the document.)

The problem they see with SOAP as I understand it is that it 
is essentially a "framework" for developing application-specific 
protocols that live in their own shadowy
world, invisible to components that are outside the agreed-upon standards
of the "subprotocol". The classic example seems to be the use of a
SOAP-RPC call for the equivalent of an HTTP "GET". Using the SOAP approach,
one would define some "special message" (getSequence) which took an identifier
parameter (in the namespace of the service provider) and returned an XML 
document describing the data. Some of the "ill-effects" of this seemingly
straightforward approach are:

	Since there is no URI for the returned document, it effectively
	does not belong to the web. This implies that you can't use it
	in webby ways, e.g.: bookmark it; create links to it in documents;
	use it in URI-based standards such as RDF, XLink, XInclude, etc.;
	In effect, only SOAP-enabled technologies can get at the data, and
	further than that, only those who have through some process learned
	about your custom SOAP messaging semantics. Furthermore, it's not
	clear how you would "guide" consumers of the data to 
	related information (e.g. the annotations of the sequence) accessible
	via other SOAP messages by encoding these "related SOAP messages"
	into the returned XML data, as you could by including URI links
	to the related info in your document.  The only way to "discover" the 
	existence of the data is via the rather complicated journey through 
	UDDI/WSDL/SOAP or some point-to-point agreement between requester
	provider. 

	The infrastructural components of the web have no insight into
	the SOAP messaging semantics. They only know it is an HTTP POST
	to a certain URL. Thus, even though it's "really only a GET",
	it won't be seen that way to caches and proxies.

Well, I'm probably not doing a very good job at articulating the arguments
(especially given my own naive understanding of "the web", I'm hoping more
experienced webheads will weigh in!); 
perhaps I'll just refer you to the resources at the end of this letter. 

I should note however that it seems important to distinguish between 
"the REST approach" and "the way any given website happens to work"- just as
they claim that SOAP uses HTTP without being RESTful, so too you'll find a
lot of discussion of points that are clearly not enforced or observed by 
most websites.

Finally, I just wanted to throw out a few thoughts on how this whole business
relates to MOBY. 

First off, one of the key points in the discussion is that there is general
agreement that no matter what approach you take, you're going to have to
develop a common "data" language, so it seems reasonable to pursue that
thread regardless of how we proceed with respect to the other issues
of how best to represent "services" on the data.

Second, in keeping with the general themes of "keep it simple", and "keep
the entry-bar low" and a general approach of incremental evolution from
how service providers are doing things now, it seems quite appealing to
be able to think about developing a system around the core concepts of
the web that everyone has basically already bought into, rather than
introducing new-fangled, relatively untried and still evolving technologies.

Finally, the basic tenets of the REST approach seem to lean towards an
extreme skepticism about the extent to which you can (or should even
try to) get consensus on "operational" sorts of issues; on the other hand, 
it is acknowledged (by some REST proponents) that if this agreement 
can be reached by having some reasonably like-minded community, it can 
be effective, and that it is more straightforward for "traditional desktop 
programmers" (as opposed to "network programmers") to think in these terms.
This same distinction seems very much related to the whole question
of "static vs dynamic services" that we've been talking about (i.e.
static service representation creating a very RPC-like representation,
vs. dynamic service representation being a more ultra-generic and
encapsulated, though self-descriptive approach). Just as I've been
trying to explore the idea of how these two approaches might be
connected, I think it's worth thinking about whether there is a similar
way of bridging the gap embodied by the RPC interface vs resource-centric
views of SOAP and REST. 

If you made it this far, thanks for your patience; I wish I could have
organized my own thoughts a little better, but it seemed appropriate
to get this out on the table for discussion...




Various references:

Some good starting points:
reasonably high-level articles:
http://www.xml.com/pub/a/2002/02/06/rest.html
http://www.xml.com/pub/a/2002/02/20/rest.html
http://www.xml.com/pub/a/2002/07/10/rest.html
http://www.xml.com/pub/a/2001/10/03/webservices.html
http://www.xml.com/pub/a/2002/05/08/deviant.html



Sites with plenty of info:
Many essays by one of the main REST proponents:
http://www.prescod.net/rest/
in particular, 
http://www.prescod.net/rest/standardization.html
http://www.prescod.net/rest/rest_vs_soap_overview/


"Wiki pages":
http://conveyor.com/RESTwiki/moin.cgi

Some REST "tutorials" and skeptical views on migrating to web services:
http://www.xfront.com


Some dedicated discussion lists:
http://groups.yahoo.com/group/rest-discuss/
http://groups.yahoo.com/group/rest-explore/


More information about the moby-l mailing list