[Bioperl-l] EMBL SOAP Server + perl and Java clients

Alan Robinson alan@ebi.ac.uk
Fri, 13 Jul 2001 12:02:41 +0100 (GMT Daylight Time)


Greetings,

If you're not interested in remote access to bioinformatics databases and
apps - STOP READING NOW!


Recently, I've been evaluating the ease-of-use, performance and utility of
CORBA vs. SOAP servers and clients. Particularly because SOAP's tunnelling
through HTTP can "smuggle" RPC requests through firewalls, c.f. CORBA and
IIOP often doesn't work over the Internet because the protocols and ports
are blocked at too many firewalls.

For those who are interested, I'm making available: (1) a prototype SOAP
web service that allows a user to query the EMBL nucleotide sequence
database by primary accession number or identifier and return the DNA
sequence, (2) some initial conclusions, (3) three simple demo clients (two
Java and one perl):

  http://corba.ebi.ac.uk:7777/clients/SimpleBioSequence/Client_Apache.java
  http://corba.ebi.ac.uk:7777/clients/SimpleBioSequence/Client_GLUE.java
  http://corba.ebi.ac.uk:7777/clients/SimpleBioSequence/Client_Lite.pl

For Client_Apache.java, you'll need to install the Apache SOAP toolkit -
there's instructions in the client.

For Client_GLUE.java, you'll need to install the GLUE toolkit - (again)
there's instructions in the client.

For Client_Lite.pl, you'll need to install the SOAP::Lite toolkit - (yet
again) there's instructions in the client.

NOTE!: The Apache SOAP implementation cannot use WSDL files. Therefore to
do remote invocations is a pain involving creating "Call" objects and
lists of parameters. However, GLUE has a WSDL compiler, that generates
helper classes necessary for you to be able to interact with the server in
an OO fashion using proxy objects (compare the code to see what I mean!).

Currently for client use, I'd recommend GLUE for Java and SOAP::Lite for
perl. (Andrew Dalke may have something to say about Python clients??).

N.B. Often it is ports that are blocked by firewalls, not protocols. So
there's still every chance that SOAP will be blocked, unless it's going
via port 80. If you want to know if you can use my server on its untrusted
port, point a web browser at
http://corba.ebi.ac.uk:7777/soap/servlet/rpcrouter/ -- If all is well,
you'll see a message, "Sorry, I don't speak via HTTP GET- you have to use
HTTP POST to talk to me.".

If there is interest in the EBI providing SOAP services, I'd like to know
(and also what might be on a wish list), then we may move it onto a proper
web server.


Server details:

- SOAP Endpoint URL: http://corba.ebi.ac.uk:7777/soap/servlet/rpcrouter/

- Service name: "urn:simplebiosequence-service"

- WSDL: http://corba.ebi.ac.uk:7777/wsdl/SimpleBioSequence_Service.wsdl

- I'm using Apache SOAP and TomCat (seems interoperable for simple things
  with SOAP::Lite & GLUE???)

- The servlet is actually a client to the EMBL CORBA server on our LAN.

- Homepage: 
    http://industry.ebi.ac.uk/~alan/soap/servers/SimpleBioSequenceService

- Method name: get_biosequence [The method has one 'xsd:string' input
                                parameter which is the identifier and
                                returns the DNA sequence as a single
                                'xsd:string']

My initial conclusions (if I've got something factually incorrect, please
do let me know!) --

1) Writing WSDL is far more difficult than IDL (I had to auto-generate
   the WSDL from the Java classes with the IBM WSDK that had themselves
   been auto-generated from an IDL compiler and then hand-editing [I
   believe IONA has a IDL-2-WSDL tool in its commercial package]). I'll be
   interested to see how (if?) the OMG's BSA IDL may be mapped to WSDL.
   GLUE provides wsdl2java and java2wsdl tools - I have to test if these
   work with CORBA skeletons (the IBM tools barfed).

2) It's trivial to create a SOAP servlet for TomCat using the POA
   implementation class of a CORBA server (The CORBA and SOAP servers both
   use exactly the same class!).

3) I cannot make any meaningful judgement about SOAP performance versus a
   CORBA server since the largest overhead for the client appears to be
   starting up the JVM (but it seems comparable).

4) An initial conclusion is that I would much rather write and access
   complex, fine-grained data via a CORBA server, than a SOAP one (SOAP 
   only returns structs, not objects). For course-grained access (e.g.
   returning the feature table of an entry or the results of an analysis)
   and simple "query/retrieval" services (e.g. sequence retrieval), SOAP
   might be convenient, but you may still be left to parse any XML that's
   handed to you.

5) Creating CORBA and SOAP clients is comparable (and easy). If it seems
   slightly more code with CORBA, it's because CORBA separates the IOR and
   IDL files, whilst in SOAP, the equivalent pieces of information are
   stored together in the WSDL file (I'm not sure if this is a good idea,
   but it is convenient for the user!)

6) It's fairly comparable the amount of work needed to implement and host
   a server in CORBA as compared to SOAP (and both can use the same
   implementation class for the actual service).

7) I can see the attraction of "loose-coupling" in SOAP-based systems
   (i.e. a web service apparently doesn't have to conform to a WSDL in the
   same way that a CORBA service must conform to an IDL, so you needn't
   have stubs and skeletons). On the other hand, it's probably giving you
   enough rope to hang yourself with.

8) "CORBA on the LAN + SOAP over the web" is a combination I'll be
   considering further in terms of providing external access to services
   of the EBI. Especially for course-grained access to analyse services
   and databases (with simple and/or self-contained data types). CORBA
   gives you standardised interfaces to services on your network & SOAP
   gives you a means to punch through firewalls. If I want to do
   fine-grained access of large and complex data types, I'll use CORBA.

9) If you're working only on a LAN without firewalls - I reckon CORBA is
   easier & cleaner. (Should you need need to serve it outside your
   domain, it's fairly trivial to write a SOAP server on top of a CORBA
   server - see above).

10) Despite its verboseness, WSDL may have a significant advantage over
    IDL because data types are defined using XML schemas, so one may add
    constraints, e.g. this integer must be between the values of 5 and
    10, or conform to a given regular expression pattern, such as
    "\d{3}-[A-Z]{2}".

11) I need volunteers with known CORBA-unfriendly firewalls to test if the
    SOAP client really will work.


Summary:

If it weren't for the damn firewall/IIOP issue, I'd consider CORBA better
in terms of ease-of-use, functionality and extensibility. However, with
SOAP & WSDL we can overcome that limitation and allow (course-grained)
access to underlying (CORBA) services via HTTP to people outside our LAN.


Alan.

PS If you're interested in knowing more about SOAP services, have a look
at http://www.ibm.com/developerWorks/webservices/ -- If you read the
article on "Web services versus CORBA", *PLEASE* also read the "Discuss"
forum. For an alternative view, visit http://www.theserverside.com/ and
read "Web services are the doomed fad of 2001?" (under hottest threads).

--
============================================================
Alan J. Robinson, D.Phil.             Tel:+44-(0)1223 494444
European Bioinformatics Institute     Fax:+44-(0)1223 494468
EMBL Outstation - Hinxton             Email:  alan@ebi.ac.uk
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK                http://industry.ebi.ac.uk/~alan/
============================================================