[MOBY-dev] BioMOBY Asynchronous Service Call Proposal v2.2/3 - The location of queryIDs

Wed Sep 27 15:15:33 UTC 2006

Hi Johan et al.,

Thanks for the updated proposal and sorry for the late response. I've  
been very busy and also wanted to think this over for a while...

On 6-Sep-2006, at 4:13 PM, Johan Karlsson wrote:

> Pieter,
>
> Thank you for a well written letter (and sorry for the delay in  
> answering).
>> There is only one thing that I don't like about the current proposal:
>> the location of queryID. For our current synchronous services it's an
>> attribute of the mobyData element. In the current async services
>> proposal the queryID jumps around the XML taking several identities:
>> * in <GetResourceProperty>status_queryID01</GetResourceProperty> it
>> is part of raw text.
>> * in <lsae:status_queryID01><!-- LSAE block --></
>> lsae:status_queryID01> it is part of an element name in the lsae
>> namespace.
>> 	(By the way: Should this element really be in the lsae namespace? I
>> don't think our status_queryIDxx elements are part of the LSAE  
>> specs...)
>>
> True, we need to put a better namespace, moby?

moby would be fine I guess...

>> It would be
>> much more convenient if the result from a asynchronous service
>> invocation would contain both the ServiceInvocationID *AND* the
>> associated queryIDs. In that case I only have to parse the service
>> response to create GetResourceProperty requests. Therefore I propose
>> to supply the queryIDs as wsa:ReferenceParameters just like the
>> ServiceInvocationID.
>>
> I am not sure that I understand the problem completely... The clients
> must internally store, somehow, the connection between the input
> (identified by the queryID) and the output? The jobs could,  
> potentially,
> take a very long time to finish and without knowing the input, getting
> the output would not be so interesting.

Not necessarily. Somehow you have to keep track of what conditions  
were used in an experiment, so you know how you got to the results  
and you make sure you can reproduce it. Either the client stores the  
inputs and parameters used for service execution and combines it with  
the results or the service output contains all the information about  
how the results were produced. This can be a combination of echoing  
inputs back and/or providing provisioning blocks and/or service notes  
to create an "e-labjournal". In the latter case you can forget about  
the XML of the original service invocation and you only need the  
output. Both approaches work, but in the latter case you keep the  
information about an experiment together and hence you can not "lose"  
the experimental conditions.

> Anyway, it is not so complicated
> to handle the queryIDs for the client (see some of the example code of
> the client at the prototype page). Maybe it is another situation that
> you are describing than the one in the example? Can you give some
> examples where it would be necessary to return the queryIDs? Again,  
> not
> sure if I understand.
>
> http://bioinfo.pcm.uam.es/prototype/

I'm sure it's not "rocket science" to store the queryIDs on the  
client, but it's simply not necessary. IMHO it's easier to return the  
queryIDs to the client as compared to storing them on the client.  
That's all.

>
>> 2.
>> WSRF contains an *optional* method to request a resource properties
>> document. With this method a client can figure out which resource
>> properties are available and hence what it can request. Although this
>> method is optional and the current proposal doesn't mention it, I
>> think it would good to keep the option open to supply such a method.
>> WSRF does not put any limitations on how a service generates and
>> provides such a document, so you can generate it dynamically or it
>> can be a static thing. If we would want to supply such a resource
>> properties document in the future it would be the easiest if it can
>> be a static one. However in the current proposal the queryIDs are
>> part of the resource properties (status_queryIDxx and
>> result_queryIDxx). This means that the available resource properties
>> depend on the amount of queries/jobs that were sent to a service and
>> hence we can not use a static resource properties document. It would
>> be more convenient if we can strip the queryIDs from the resource
>> properties and provide them as wsa:ReferenceParameters. In that case
>> there are only two resource properties (status and result) and we can
>> describe those in a static resource properties document.
>
> At least until now, we have tried to only include exactly what is  
> needed
> and avoid many, potentially, useful but maybe more complicated WSRF  
> methods.

I totally agree and I'm happy you have chosen such an approach.

I'm not arguing to add the GetResourcePropertyDocument method right  
now, but I feel it would be good to keep the option open to do so in  
the future. If the queryIDs are moved to the SOAP header as  
ReferenceParameters it is much easier to add such functionality  
later, because the ResourceProperties are static and hence the  
ResourcePropertiesDocument can be static. With the current proposal  
the ResourceProperties are dynamic and a service provider would have  
to generate ResourcePropertiesDocuments dynamically for each service  
incocation. Again this won't be "rocket science", but I think it's  
unnecessary overhead.

> Yes, the WSRF method GetResourcePropertyDocument could be useful  
> but it
> is possible to manage without it since the clients would always be  
> able
> to construct the property qnames as long as they keep track of the
> queryIDs. But of course, if there is a great demand for this optional
> WSRF-method we could add it to the documentation.
>
>> Therefore I propose a translocation of BioMOBY queryIDs from the
>> resource properties to wsa:ReferenceParameters. As far as I
>> understand, with all the specifications involved this would be legal,
>> but please correct me if I am wrong. Below I included some examples
>> of what the XML might look like when the queryIDs are moved to the
>> SOAP header as wsa:ReferenceParameters. Let me know what you  
>> think....
> The problem (?) is that the EPR is supposed to be opaque, or in
> particular, the ReferenceParameter (<moby:ServiceInvocation>)  
> should be
> "assumed to be opaque" for the clients.
>
> "Reference parameters are also provided by the issuer of the endpoint
> reference and are otherwise assumed to be opaque to consuming  
> applications."
>
> (quoting from the WS-Addressing standard that WSRF builds upon)
>
> At least my interpretation of this is that clients are not supposed to
> understand or parse or manipulate the reference-parameter but instead
> just echo it back (if I am confused please correct me)?

I think you are right, but in that case I think moving the queryIDs  
to the EPR is more opaque than the current proposal. The EPR from the  
SOAP header together with ResourceProperties contains the information  
required for retrieving status or results. In the current proposal  
the client can echo back the EPR, but it still has to create the  
ResourceProperties *dynamically*.

If the queryIDs move to the EPR and are returned by a service on  
invocation, the client can echo back the EPR (containing both the  
batch ID and the job IDs) and only needs to append one *static*  
ResourceProperty (either the one to get the status or the one to get  
the results). Hence the client doesn't have to create anything  
dynamically, requires less logic and you will only request one  
resource property at a time, so we even would not need the  
GetMultipleResourceProperties method anymore. (I assume it doesn't  
make sense to request the status and the results at the same time.)

If a client wants to manipulate the the EPR by stripping out some  
queryIDs to retrieve a ResourceProperty only for a subset of  
queryIDs, they *can* do that. This doesn't make life more complex for  
the service and as far as I understand the specs it is legal. It  
should not be required to manipulate the EPR and it isn't. If clients  
lack the logic to request resources for a specific queryID from a  
batch-job, they can always echo back the whole EPR as  
ReferenceParameter and get the resource for all the queryIDs of the  
batch. So moving the queryIDs to the EPR requires less logic on the  
client side and therefore I think it is a more elegant solution.

> Yes, the
> reference-parameter can be given as XML but this XML should not be
> modified by the clients (I assume that you mean that the clients  
> should
> just include the <moby:Job> tags that they need to find status or
> results for particular jobs in the batch-call). The issuer of the
> endpoint reference naturally must handle the EPR but the clients  
> should
> not try to understand the EPR.
>
> Also, conceptually, the EPR refers to a specific resource (in this  
> case
> what we call "batch-call", many jobs). If we manipulate the EPR we
> "change" its original reference. We tried to clearly define in the
> proposal what the EPR refered to (what the "resource" was).  
> Manipulating
> the EPR in some way confuses what it refers to.

I disagree. If the queryIDs move to the EPR, the EPR will simply  
consist of multiple parts. It remains clear what the EPR refers to.  
The EPR as a whole remains a reference to a specific service  
invocation. It's perfectly normal according to the specs to have  
multiple ReferenceParameters in an EPR, so I fail to see how this  
will be confusing....

>
> -------------------
>
> Regarding "dynamic" property names (status_{queryID}); the official  
> WSRF
> specification mandates that all properties of a resource MUST be
> described by a XML Schema  but this is not strictly enforced in the
> library we used for the Perl examples (WSRF::Lite) (or at least, in  
> the
> examples of WSRF::Lite that I have seen there is no such XML schema  
> file) .
>
> Just to give an example to give an idea of what I am talking about  
> (non
> BioMOBY...):
>
> <!-- Resource property element declarations -->
> <xsd:element name="NumberOfBlocks" type="xsd:integer"/>
> <xsd:element name="BlockSize" type="xsd:integer" />
> <xsd:element name="Manufacturer" type="xsd:string" />
> <xsd:element name="StorageCapability" type="xsd:string" />
>
> <!-- Resource properties document declaration -->
> <xsd:element name="GenericDiskDriveProperties">
>     <xsd:complexType>
>         <xsd:sequence>
>             <xsd:element ref="tns:NumberOfBlocks"/>
>             <xsd:element ref="tns:BlockSize" />
>             <xsd:element ref="tns:Manufacturer" />
>             <xsd:any minOccurs="0" maxOccurs="unbounded" />
>             <xsd:element ref="tns:StorageCapability" minOccurs="0"
> maxOccurs="unbounded" />
>         </xsd:sequence>
>     </xsd:complexType>
> </xsd:element>
>
> This resource has four properties (tns:NumberOfBlocks, tns:BlockSize,
> tns:Manufacturer and finally tns:StorageCapability). The qnames of  
> these
> four properties are pre-defined/fixed and not like what we need
> "status_q1", "status_q2" etc etc.
>
> We would need that the resource properties schema allows open content
> (using a xsd:any element). This means that the list of valid qnames  
> for
> the resource properties is "open". See "3.3.1.1 Establishing a List of
> Valid Resource Properties" in "WSRF Application Notes"
> (http://docs.oasis-open.org/wsrf/wsrf-application_notes-1.2-cd-02.pdf)
> for more information.

I understand it is possible to use a schema that allows open content,  
but it's not necessary. So why make life more complicated?

With kind regards,

Pieter

>
> Kind regards,
> Johan Karlsson
>
> -- 
> Johan Karlsson
> Instituto Nacional de Bioinformática (INB)
> Integrated Bioinformatics Node (GNV-5)
> Dpto. de Arquitectura de Computadores
> Campus Universitario de Teatinos, despacho 2.3.9a
> 29071 Málaga (Spain)
> +34 95 213 3387
>
>
>
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev

Wageningen University and Research centre (WUR)
Laboratory of Bioinformatics
Transitorium (building 312) room 1034
Dreijenlaan 3
6703 HA Wageningen
The Netherlands
phone: 0317-483 060
fax: 0317-483 584
mobile: 06-143 66 783
pieter.neerincx at wur.nl