[MOBY-dev] BioMOBY Asynchronous Services

Sat Jun 11 18:07:41 UTC 2005

Here is my "action" from Vancouver...

With best regards, and cheers, (and sorry for a long email),
Martin

Asynchronous BioMoby services
=============================

In Vancouver, we have started a discussion about the ways how to call
BioMoby services asynchronously. I promised to summarize the
experiences we have been using in other Web Services (Soaplab etc.)
for this task - and this email does it.

Meanwhile, however, David González Pisano (and other people) from
Spanish National Institute of Bioinformatics (INB) created a detailed
proposal how to introduce asynchronicity into BioMoby, based on a
different approach (more BioMoby specific). I have also been presented
at a meeting (incidentally, in a completely different, non-Moby
related meeting at EBI) where this proposal was explained.

Therefore, I can make some rudimentary comparison of the two proposed
methods. But first, what they are? (I skip the introduction why we
need an asynchronous behavior - this is already very well introduced
in David's proposal).

Almost any asynchronous invocations consists of:

1) Staring request:

   a) A client sends a request, (somehow) specifying that this
      request should be treated asynchronously.

   b) A service responds with an identifier assigned to this request
      (called also 'job handler' or 'ticket').

2) A client uses this identifier to send a polling request. A service
   responds with a status indication. Polling requests are being sent
   until status indicates that job had finished.

3) Getting result request. Using still the same identifier, a client
   ask for a result and service provides it. (Sometimes this kind of
   request can be part of the last polling request from the previous
   step.)

The INB's proposal achieves the above by adding specific attributes
into the existing mobyData XML tag, and letting the service return an
empty result until a real result is available.

My proposal suggests to have several methods (whose names are derived
by fixed suffixes), each of them would return a specific result (still
defined as a Moby object), and one of them would return a real result.

Both proposals are similar (almost identical) in their features, they
differ only in a way how these new features are coded into used
protocol. So how do they differ?

1) I must start with the one that INB's proposal claims as their
advantage. The proposal says that is "able to support asynchronous
calls without changing the underlying software". I do not buy it,
however. I think that *both* proposals would need to change clients if
they want to take advantage of the new features, and to change
services in order to provide new features. But also, in *both*
proposals, there is no need to change anything if you still use a
synchronous call. The old clients will not break, the old services
will continue to work. So this does not constitute any difference
between the two proposals (IMHO, of course).

2) A real difference - and this one is in favor of the INB's proposal
- is that specifying asynchronicity by attributes of the mobyData tag
allows to ask for different treatment of individual mobyData parts. A
client can sent two mobyData tags in one request, asking for the first
one to be treated synchronously and the second one
asynchronously. This cannot be achieved by having several method
names.

3) Another real difference - this time in favor of my proposal - is
related to the existing standards. BioMoby uses very proprietary way
how it defines its data types and from that how its input and output
data look like. But for the transport messages the BioMoby's behavior
is so far quite standard. The same service always transforms data A
into data B. And I think that this is a good thing. Introducing new
attributes but keeping the same method name means that a service would
return different things each time. But if we have fixed method names -
we again know that always a service/method1 will transform data A into
data C, and a service/method2 will transform data C into data B.

   The reason why I take this as an advantage is that there is a lot
of general tools for Web services that could be still used for BioMoby
services if we keep the behavior standard. Having more methods is
perfectly easy to define in WSDL (that can be still a bit less useful
for BioMoby data types - because they are by default non-standard -
but in the same time good enough for many other tools). I would feel a
bit uncomfortable to invent a proprietary solution if there is a
standard one. Unless, of course, you feel that the point ad 2) above
is significantly important that it justifies a proprietary solution.

4) A (perhaps slightly disputable) difference is about the returned
objects from the starting and polling requests. The INB's proposal
does not return any real object because the status is in a mobyData
attributes. My proposal defines several real BioMoby data types
(BioMoby objects) that can carry more complex information that just
attributes. They can even be extended by inheritance like any other
BioMoby object. The only difference comparing to other BioMoby data
types (objects) is that you will never discover any service returning
them - because a service provider does not register his/her service
with such output types. This allows, btw, to use the same objects for
a complex notification (if we have anything like that in the future) -
see few comments on it in my proposal.

5) A small difference - in favor of the INB's proposal - is that their
proposal allows to save one network request because the last polling
request can become a request returning a real result.

And that's it - I have not found any other differences (so far).
Still, it may be useful to summarize what *neither* of two proposals
solves:

1) A client does not know *in advance* whether a service supports
asynchonicity or not - without some investigation. BioMoby strategy is
to put/find this information in a service metadata using LSID
resolution. I would say that such information is so crucial that it
should be accessible easier - but allowing to have it in the service
registration object, so clients can find/get it directly together with
the service URL when they ask the registry.

2) A client must treat the full sequence of corresponding requests
(starting, polling and resulting requests) as an "atomic" operation -
which means that if any of these requests fails, the client needs to
start again from the beginning. This solves the problem with having
same services replicated but not always being able to take over each
other in the middle of the job.

3) Introducing asynchonicity opens doors to the longer persistence of
results. It is easy to imagine that once you have a job identifier
(ticket, or whatever) you theoretically can ask for the same result
several time without creating it again from the beginning. Of course,
if a service supports it. And also a client could have an option of
say "please remove the result, I am not anymore interested in it"
(again, here both proposals can accommodate this feature either by
having an additional (cleaning, reseting) attribute or a additional
method). But BioMoby needs to document (or reject) such persistent
behavior.

Well, perhaps now I should say what is my proposal :-)

a) It is based on an OMG standard "LSAE - Life Sciences Analysis
Engine" (http://www.omg.org/cgi-bin/doc?dtc/05-04-01 and
http://www.omg.org/cgi-bin/doc?dtc/05-04-08).

b) If we have a service ABC and its service provider wishes to support
asynchronous calls then the service will have the following methods
(with the following input/output data types):

RealResult ABC (RealInput input)

   This is a standard synchronous invocation, and the RealResult and
   RealInput are normal BioMoby data types. The service ABC is
   registered with them as input and output.

JobHandler ABC_async (RealInput input)

   This is a "starting request". The JobHandler contains a ticket (job
   identifier) but may contain also other things (like a suggestion
   how often is worth to poll for results). This method call fails if
   the service does not support an asynchronous behavior. This does
   not break the old clients because they would never use this method.

NotificationEvent ABC_status (JobHandler handler)

   This is a "polling request". NotificationEvent contains status of
   the invoked job (it can failed if the handler is not knows or
   expired). The ways how it represents a status can be different
   (from just a status, to a status including progress where progress
   can be simple hear-beating progress, percent progress, step
   progress or time progress). For a feel what it can contain look
   please at the picture taken from the LSAE model:
   http://www.ebi.ac.uk/~senger/LSAE-pictures.html. I would convert
   them into a proper BioMoby data types (objects) if/when is clearer
   that this proposal has chance to be accepted.

RealResult ABC_result (JobHandler handler)

   This is a "result request" returning a real result. It can fail if
   the handler does not exist or if the method is called in an
   unappropriated moment/order.

We can add also a method ABC_clean (JobHandler handler) if we decide
to support persistence.

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger at EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger