[MOBY-dev] data by reference - a request for comments

Tue Jul 1 19:04:40 UTC 2008

Hi everybody,
	I have added inline some personal comments about the RFC:

Martin Senger wrote:
> Hi all,
> 
> Yesterday, Mark, Eddie and I, we spent some time to evaluate what was
> proposed during our last meeting about sending data by references. Here are
> some thoughts that may crystalize into a request fro comments.
> 
> *What is the purpose of sending data by reference
> *
> Well, the first purpose (A) is obvious: we want to be able to deliver huge
> data from a service. So the service returns only a reference instead of the
> real data and the client can fetch the real data using some memory-friendly
> protocol (usualy a simple HTTP/GET).
> 
> But it appeeared that it was not the only purpose. The second purpose (B) is
> to be able to send around already existing references (such as URLs of the
> EMBL or NCBI records). The existence of this B purpose makes the problem a
> bit harder but it is a valid puspose. So the first question is:
> 
> *Do you agree that we pursue both purposes in this requests?
> *
> *The machinery
> *
> a) A service claims (in the registration time) that it can provide data by
> reference.
> 

It can be claimed from the RDF of the service as a predicate, including the 
understood reference protocols (which should be described elsewhere in an RDF 
ontology).

> b) A client asks for getting back references by including "acceptRefs"
> attribute in mobyData tag. The attribute lists one or more protocol names
> that the client can accept.
> 

I suggest using serviceNotes child element (called for instance acceptRefs 
:-), telling what it can understand) instead of adding a new attribute to 
mobyData, in both ways of the MOBY dialog. The client uses the acceptRefs 
element in the message sent to the service to tell what it can understand, and 
the service answers using the acceptRefs explaining what it have done. The 
service must answer a subset of the understood reference protocols to the 
client in order to maintain the contract.

If the service cannot fulfill the request where client does not understand the 
protocols' the service is able to handle (because it refuses to send too much 
information inline), it can fail using a new exception code related to message 
size.

If the service uses references which are not understood by the client, then 
the client can mark the message or the service as corrupted or offensive.

> c) A service *can* obey such request and send one or more *primitive
> data*as references (the focus on primitive type is new, originaly we
> thought
> about allowing references on any level, but now, mainly becuse of the
> purpose B we do not propose it anymore). It can use any of the protocols
> mentioned in the client's "acceptRefs" attribute. It can send references
> only if at least one protocol matches.
> 
> *How does a client knows what protocols a service supports?*
> 
> This is a fundamental question that goes closely with "use existing
> standards rather than inventing your own". An ideal solution is perhaps
> this: A service returns not a reference to data itself but a reference to a
> WSDL document that contains all supported protocols, including the endpoints
> for this particular data. It is a nice idea but it breaks the purpose B - we
> cannot use existing references without wrapping them first in a WSDL
> document. The WSDL is strong because it gives us actually an *interface* how
> to get data, but it is weak because the references cannot be used as *
> indexes* (e.g. for further caching). Also, it does not solve the client
> side: the "acceptRefs" attribute still needs to use a list of protocol names
> (and not a WSD document because clients cannot make WSDL documents visible
> to the world).
> 

A passive, static way is publishing the service capabilities about references 
in the RDF. An active, dynamic way is creating a new service port (like we did 
with asynchronous services), which provides live metadata information about 
the service. I could even be linked/related to LSID metadata servers, but it 
is still outside my knowledge.

> After going there and back, we concluded (and it is now our proposal for the
> request of comments) that the service returns a reference to data, and
> clients can deduce what protocol to us by looking at the protocol part of
> the returned URL. We are aware that this is fine for usual protocol, such as
> HTTP and FTP, but it cannot serve data, for example, by a SOAP. But, as
> Eddie pointed out, if somebody wants SOAP for data, she can return data
> directly in the Moby message.
> 
> *The remaining questions
> *
> Dmitry suggested to use WSRF. We think that he meant something else: It
> could be used instead of the whole Moby message - but that is not what we
> are looking for. We are looking for replacing just data part by references,
> and we want still to keep the original Moby message as it is used now. So we
> have concluded: no WSRF.
> 
> *How can a client tells a service that she is sending a reference instead of
> data?* This could be useful for chaining services. We have not talked about
> it. Ideas welcome.
> 
> The machinery described above may not allow to find, in advance, what
> protocols a service is able to provide. It depends on what a service can
> register into a moby central registry. It can be just a boolen flag ("I can
> provid references"), or a list of supported protocols ("names"), or actually
> nothing. The latest option has an advantage that no change in the registry
> is needed. *Can we live with this simple option?*
> 
> I am not sure if I covered all, but better to send ti now and wait for your
> comments.
> 
> Cheers,
> Martin
> 

	Best Regards,
		José María

-- 
"There is no reason why anybody would want a computer in their home" -
	Ken Olson, founder of DEC 1977
"640K ought to be enough for anybody" - Bill Gates, 1981
"Nobody will ever outgrow a 20Mb hard drive." - ???

"Premature optimization is the root of all evil." - Donald Knuth

José María Fernández González
Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 3061)
e-mail: jmfernandez at cnio.es		Fax: (+34) 91 224 69 76
Unidad del Instituto Nacional de Bioinformática
Biología Estructural y Biocomputación	Structural Biology and Biocomputing
Centro Nacional de Investigaciones Oncológicas
C.P.: 28029				Zip Code: 28029
C/. Melchor Fernández Almagro, 3	Madrid (Spain)

**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.