[MOBY-dev] data by reference - a request for comments

Pieter Neerincx pieter.neerincx at gmail.com
Wed Jul 23 15:06:38 UTC 2008


Hi,

> Martin Senger wrote:
> A service *can* obey such request and send one or more *primitive  
> data* as references (the focus on primitive type is new, originaly  
> we thought about allowing references on any level, but now, mainly  
> becuse of the purpose B we do not propose it anymore).

That doesn't sound very appealing to me. I use quite large data  
structures and can have tens of thousands of them in a collection  
(think of oligos for micro arrays.) If I can only replace the  
primitives with references, I would still have to send a BioMoby XML  
structure with hundreds of thousands of references replacing the  
primitives. A client or the next service in a workflow would have to  
resolve all those references with each reference pointing to only a  
relatively small amount of data. That would be a mess and just cause a  
huge amount of overhead.

So, I would really love to see the ability to replace a complete  
collection or even the entire mobyData block with a reference. Having  
said that I really like the idea of purpose "B" too, where I can embed  
a link to existing EMBL/Genbank/etc. records on EBI/NCBI/etc. servers.  
The only risk would be that a client or the next service in a workflow  
happily resolves those links as fast and efficient as it can causing  
too much load on those servers. A colleague once managed to have our  
entire campus disconnected from the NCBI with simple script resolving  
PubMed URLs... it was really easy :). With references it's always  
possible to reference something on someone else's servers, but with  
purpose "B" it's more likely to happen.


> Martin Senger wrote:
> I do not like using the RDF for mandatory features. The RDF, at  
> least in the
> current moby, is optional and should not carry anything that is  
> significant
> for the service and its behaviour. The data by reference should be  
> part of
> the main BioMoby API (or whatever we call it).

I agree with Martin partially on this one: pass by reference should be  
part of the main BioMoby API and not something optional as it is  
essential for successful client-service communication. But that  
doesn't necessarily exclude using RDF. We could also promote the RDF  
stuff from optional to required. I don't want to open a can of worms  
here and I'm not advocating making the RDF features mandatory. I'm  
just saying it might be an option...

> Dmitry Repchevski wrote:
> If we are going to stay with SOAP (and especially move to doc/lit)  
> the right way is to provide streaming through SOAP Attachments (SwA)  
> or even better using MTOM.

I don't think attachments of any kind would be a good idea. Although  
theoretically a nice solution for streaming data and keeping  
everything together, there are practical problems:
* Most firewalls block attachments over 5 MB in size. One of the great  
things of web services compared to the good 'ol CORBA is that we can  
tunnel over any protocol. So most of us use HTTP(S) to bypass our  
institute's paranoid sys admins and skip the part where you'd have to  
bug them to open up another firewall port. I'm not sure if this  
problem persists with the newer MTOM, but it's disaster with MIME or  
DIME attachments.
* I'm not sure about the newer MTOM, but the combination of SOAP and  
MIME attachments is a disaster of you're using Perl. The required  
modules are dying projects and you'd have to patch them manually to  
make it work. If we would go with attachments we would either drop  
Perl support for BioMoby or one of us would have to write a SOAP::Lite  
+ MIME::tools alternative...

I agree with Dmitry that if we are going to stick with a standard it  
has to be a de facto standard. Hence well supported by toolkits/ 
implementations. A standard that only exists on paper is useless. Most  
of us want to build or use BioMoby services and are not that much  
interested in writing toolkits. I never managed to become a big WSRF  
fan. It's needlessly complicated causing lot's of overhead.

Actually I like what Mark wrote initially most:

> We'll write-up the formal specification of how to do this soon, but  
> briefly the idea is that we will use the xlink XML attribute in a  
> Moby Object XML  <.... xlink=''/>.  When that tag is present, it is  
> assumed that the content of that node is available at the URI in  
> that reference.  During service registration a provider will  
> indicate the various transport protocols they provide for creating  
> references (e.g. http, ftp) and this will be discoverable during a  
> registry query.  If you don't indicate a protocol, then you are  
> saying that you do not support pass-by-reference, and therefore all  
> existing services are supported.  When accessing a service, you  
> indicate to the service provider that you want data to be passed by  
> reference by adding an attribute in the mobyData block <mobyData  
> acceptRefs="http ftp">.  The service provider then has the option of  
> providing you references for any data they wish.  The xlink  
> attribute can appear at any level in your Moby object, such that  
> some data may be passed in the object itself, while other data from  
> the same object may be passed by reference.


For the BioMoby data structures this only requires allowing an  
optional xlink attribute and acceptRefs in the request from the  
client. Maybe it's nicer to have acceptRefs as a child element of  
mobyData or maybe at another level in the XML, but not as an  
attribute, because attributes are intended for single items. In the  
example above acceptRefs contains a space separated list and off  
course we can parse that, but it's ugly XML. Anyway, what Mark wrote  
above is very flexible, light-weight and relatively backwards  
compatible. If I can choose between a simple <... xlink=""> attribute  
or the bulky WSA header from Dmitry's example like in:

   <mobyws:resource>
     <wsa:EndpointReference xmlns:wsa="http://www.w3.org/2005/08/addressing 
">
       <wsa:Address>http://myserver.com/MyService?asyncId=ID</ 
wsa:Address>
       <wsa:ReferenceParameters>
         <mobyws:ServiceInvocationId xmlns:mobyws="http://biomoby.org/">
           ID
         </mobyws:ServiceInvocationId>
       </wsa:ReferenceParameters>
       <wsa:Metadata>
         mobyws:result_queryId00/image/jpeg/content
       </wsa:Metadata>
     </wsa:EndpointReference>
   </wsa:resource>

I'd choose the xlink attribute!

Cheers,

Pi

On 3•Jul•2008, at 11:57 AM, Dmitry Repchevsky wrote:

> Hello,
>
>> For all I know WSRF is already been surpassed by WS-RT. http://www.ibm.com/developerworks/grid/library/gr-wsrfwsrt/index.html?ca=drs-
>> There are quite a number of people who are disenchanted by WSRF.  
>> See http://blog.harbulot.com/post/2006/11/22/Experiences-with-WSRF  
>> for one example.
>>
>> Machiel
> This is what I am talking about... Martin has a reason we have had  
> enough problems with WSRF... now we have a "better" one (WSRT)...
> But (!) There is an implementation? It looks like that this is a  
> perfect technology that exists in a paper...
> Should we implement all the standards just to be able to use it? If  
> IBM claims to have a new/cool standard (by the way "standard" - who  
> is behind it?) but why can't I find (free) implementation of it?
>
> It uses WS-Transfer has more sense to me because it can describe a  
> way to represent a moby object in moby message tree (for wsrf). and  
> it is at least w3s standard.
> Claiming something to be a standard is always a bold claim...
>
> Dmitry
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev

-------------------------------------------------------------
Wageningen University and Research centre (WUR)
Laboratory of Bioinformatics
Transitorium (building 312) room 1034

Dreijenlaan 3
6703 HA Wageningen
The Netherlands

phone:  +31 (0)317-483 060
mobile: +31 (0)6-143 66 783
e-mail: pieter.neerincx at gmail.com
skype:  pieter.online
------------------------------------------------------------





More information about the MOBY-dev mailing list