[MOBY-l] messaging discussion

Lincoln Stein lstein at cshl.org
Wed Feb 26 21:14:33 UTC 2003


Here is a more complete discussion of messaging options that are open to MOBY.  
Sorry for the last-minute nature of this!!!

Lincoln


-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================
-------------- next part --------------
MOBY PROJECT: TECHNICAL REPORT ON WEB MESSAGING LAYER

Date: February 24, 2003
Author: Lincoln Stein
Version: early

This report concerns the messaging layer of the Moby project, that
point at which semantic information is exchanged between the
data consumer (i.e. the biologist) and the data provider (i.e. the
model organism system administrator).

There are a wide variety of possible messaging systems.  Here are a
number of prominent ones listed in rough chronological order.

      1) Custom messaging system using raw TCP/IP
      2) Custom messaging system using BEEP (an IEEE applications
      protocol framework)
      3) ASN.1 exchange
      4) Microsoft DCOM
      5) REST
      6) CORBA
      7) XML-RPC
      8) SOAP
      9) .NET

For the purposes of this assessment, I ignored 1, 2, 3 and 4.  I
rejected custom messaging because it is a fallback position.  We
should only reinvent the wheel if we are absolutely certain that none
of the other will meet our requirements.  Exchange of messages via
ASN.1 streams and Microsoft DCOM can rightly be treated as legacy
solutions that have found important places in niche applications but
are not accepted as solutions for enabling the exchange of semantic
information across administrative domains.  SOAP arises from, and
supersedes, XML-RPC, and so are folded together.  I will not consider
NET because its marketing literature gives me a headache.  This
leaves REST, CORBA, and SOAP.

I will now consider these in chronological order.

--------------------------------------------------------------------------

REST
----

REST stands for REpresentation State Transfer, and is a term coined by
Roy Fielding in his graduate thesis to describe a style of information
architecture that had already become the de facto standard for the
World Wide Web.  According to Fielding, REST is suited for scaleable
applications in which relatively large hypermedia representations of
information resources are exchanged within the "anarchic network."

Key Features:

a) Resources are identified using stable addresses, the URI.

b) Resources are never exchanged themselves.  Instead representations
of resources are exchanged.  A particular resource, such as a database
entry, may be represented in multiple ways, e.g. as an HTML file or a
postscript file.  Resources are viewed as changing with time.

c) REST has many nouns, but very few verbs.  Its verbs follow the CRUD
paradigm, and consist of PUT, GET, POST and DELETE.  Its nouns are an
extensible set of hypermedia representations.

d) REST is stateless, and places the burden of maintaining session
information squarely on the client.

e) REST is close to the transport layer, and allows but does not
require applications to be concerned with performance issues
such as caching, parsing, and rendering latency.

Discussion:

Probably the most unique aspect of REST is its use of URIs to address
each resource.  For example, in a DAS application, a URI can be used
to identify a particular segment of a genome:

   http://my.site/das/d-melanogaster/r3.1/2R

This identifies the genome of drosophila melanogaster, assembly
release version 3.1, chromosome arm 2R.  To fetch the list of features
from this region, one would issue a GET request on the following URL:

  GET http://my.site/das/d-melanogaster/r3.1/2R/features

To address an individual feature named "exon00001", one refers to this
URL:

  GET http://my.site/das/d-melanogaster/r3.1/2R/features/exon00001

To add a new feature to the chromosome, one issues a PUT:

  PUT http://my.site/das/d-melanogaster/r3.1/2R/features/exon00002

Updates and deletes are handled similarly.

REST is elegant because it allows very generic software to be written.
For example, caching code does not have to know anything about the
contents of the data it caches, and fetching code can simply hand off
the data it receives to the appropriate helper application.  However,
it is unclear to me how REST can be used to handle transformative
tasks.  For example, do we transform genes into GO_terms this way?

  GET http://my.site/das/d-melanogaster/genes/notch/GO_terms

If we want to pass parameters to the request, do we do it with a query 
string?

  GET http://my.site/das/d-melanogaster/genes/notch/GO_terms?descendents=true

Who is using REST: 

In one sense, everyone is.  In another sense, no one.  Aside from
WebDAV, there are very few "pure REST" applications out there.  There
are many almost-REST services, but a variety of common practices, such 
as the use of cookies, interferes with REST by confusing the semantics 
of stateless information transfer operations.

DAS/1 is fairly close to a REST application, but it does some things
that are discouraged by the REST design, such as using POST to mean
GET.

Software support:

There is significant infrastructural support for REST services.  It's
the web!

==================================================================

CORBA
-----

CORBA is a Remote Procedure Call/Remote Method Call protocol which
uses binary-encoded objects and an object request, lookup and
serialization infrastructure called an ORB.

Key Features:

  - A RPC/RMC-based API that makes local procedure calls look
  more-or-less like remote ones.

  - Bindings to many popular languages, including Java, C++ and Perl.

  - A language-independent interface description language (IDL) to
  describe objects and their methods.

  - A directory service for identifying services and returning their
  locations.

Another key feature of CORBA is its ability to support legacy
applications in C++, Java or Perl.  In theory, one can take existing
library code written to support a local application, CORBA-ize it, and
turn it into a network service.  Client code written to access the
local library can then operate on the remote service with little
source code changes.  In practice, I have found this process a little
less than transparent because of the large amount of CORBA
initialization magic that must be performed before a remote request
can be made.

Discussion:

The Life Sciences committee of the Object Management Group (OMG), has
been hard at work for several years developing IDLs for the life
sciences.  However, due to the rapid change of the field, the IDLs
that are being ratified now have little relationship to the MOBY use
cases, and therefore are not as valuable as one would hope.

A CORBA DAS service would provide an interface like the following:

  data_source = ORB.new_das_source('urn:lsid:biodas.org:provider/das');
  data_source.setSource('urn:lsid:www.taxonomy.org:taxa/dmelanogaster');
  data_source.setVersion('r3.1');
  segment     = data_source.getSequence('urn:lsid:my.site:chromosomes/2R');
  featureset  = segment.getFeatureSet();
  exon        = features.getExon('urn:lsid:my.site:features/exon000001');

The process of fetching an exon feature becomes a set of method calls.
Objects are identified by an arbitrary naming system that is unrelated
to the Web's URI system.  For fun, I've used the LSID system, but in
fact any opaque identifier would do here.

Who is using CORBA:

At one point, CORBA was going to be the saviour of bioinformatics.  It
was heavily promoted by the EBI and by a number of biotech/biopharm
companies.  It has found a niche in certain LAN applications, but has
not achieved any significant use for public servers.  I do not have a
good sampling of opinions as to why it has failed, but Ewan Birney, an 
early and very strong proponent of CORBA, quotes "performance
problems" as a major factor.

CORBA never had the support of Microsoft, and longer has the support
of IBM or Sun, who initially promoted CORBA heavily.

Software Support:

The software support is spotty.  An ORB library is required for CORBA
to function.  Most Linux machines have a preinstalled ORB that comes
with Gnome (which is CORBA-reliant), Java comes with an ORB, and
Netscape Navigator used to come with a preinstalled ORB (and may
still, although I haven't checked).  Windows XP does not seem to have
an ORB installed.  Perhaps this is not surprising, in light of
Microsoft's commitment to the incompatible .NET architecture.
	
==================================================================

SOAP
----

SOAP initially stood for Simple Object Access Protocol.  It is no
longer simple, and so the name stands on its own.  It is a Remote
Procedure Call/Remote Method Call protocol which uses XML for its
messages.

Key Features:

  - A RPC/RMC-based API that makes local procedure calls look
  more-or-less like remote ones.

  - Bindings to many popular languages, including Java, C++ and Perl.

  - A language-independent service description language called WSDL.

  - A choice of XML-based data definition languages, the most popular
  being XSL.

  - Support for a directory service called UDDI.

  - A lot of industry support, books, etc.

Discussion:

SOAP is positioned very much in the same niche as CORBA.  In theory,
one can take legacy applications, flip a compiler switch, and have
them act as SOAP clients and servers.  This is because each language
provides bindings that map its fundamental data types and method call
conventions into language-independent XML encodings.

If we were to repeat the DAS example, it would look identical to the
CORBA example except that the very first line would refer to some
SOAPy constructor.

I have tried SOAP in my own applications and find that it works fine
for simple to moderately complex applications.  Because of its
transparency, programmers can easily be tricked into performing
foolish operations.  For example, in a local application it makes
sense to create lots of large complex objects and then invoke method
calls on them.  In SOAP, every method call requires the object to be
marshalled (serialized along with all its subobjects), transmitted
across the wire, and unserialized by the server.  The whole process is
repeated on the way back.  The application is slow, and the programmer
doesn't know why.

SOAP does not work well in applications that transfer large amounts of
data and require that latency be minimized.  For example, the
genome-size data streams that DAS generates will croak SOAP/1.1 and
earlier libraries.  SOAP 1.2 fixes this by allowing for incremental
event-based parsing of messages, but this destroys the procedure-call
API by exposing the developer to the innards of the object marshalling
and unmarshalling.

Who is Using SOAP:

Everybody is talking about SOAP but few people currently use it in
production.  This applies both to biological and non-biological
domains.  My greatest success with it has been a database application
that tracks the merges and splits in gene names.  The operations in
this application are lightweight and require very little data
transfer, and a server written in Perl communicates very nicely with
clients written in Java and C.  However, the application remains a
proof of principle.  In production I connect to the database over a
socket using the database's SQL API because I do not have confidence
in the Perl SOAP library.

Software Support:

SOAP is receiving strong developer support from IBM and Sun.  However,
the level of support is not even across languages.  It is very good
for Java and C#, pretty good for C++, good for Perl (although I don't
trust the library to be bug-free), and poor for Python.


------------------------------

Bibliography (spotty):

* REST

REST+SOAP
http://www.intertwingly.net/stories/2002/07/20/restSoap.html

Roy Fielding's Dissertation:
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

FrontPage RESTwiki
http://internet.conveyor.com/RESTwiki/moin.cgi/FrontPage

Roots of the REST/SOAP Debate
http://www.prescod.net/rest/rest_vs_soap_overview/

* CORBA

Client/Server Programming with Java and CORBA, Orfali and Harkey

* SOAP

Programming Web Services with SOAP, Snell Tidwell and Kulchenko.




More information about the moby-l mailing list