[MOBY-dev] BioMOBY Asynchronous Service Call Proposal v2.2/3 - The location of queryIDs

Fri Sep 29 16:12:55 UTC 2006

Hi Pieter,

Thank you for your help and suggestions.

 From what we can understand, no new functionality would be possible by 
moving the position of the queryIDs. Really, it is just a question of 
where to put the information, the same information is sent in both cases.

We would like to keep the proposal as it is and to have a standard that 
provides a much needed functionality in BioMOBY.

> Not necessarily. Somehow you have to keep track of what conditions  
> were used in an experiment, so you know how you got to the results  
> and you make sure you can reproduce it. Either the client stores the  
> inputs and parameters used for service execution and combines it with  
> the results or the service output contains all the information about  
> how the results were produced. This can be a combination of echoing  
> inputs back and/or providing provisioning blocks and/or service notes  
> to create an "e-labjournal". In the latter case you can forget about  
> the XML of the original service invocation and you only need the  
> output. Both approaches work, but in the latter case you keep the  
> information about an experiment together and hence you can not "lose"  
> the experimental conditions.
>   

Well, an "e-labjournal" is a nice idea but it is outside of the scope of 
this proposal. Of course, it is not necessary that the service saves the 
input (async services would temporarily keep the results depending on 
the policies of the service provider) because the client could keep 
track of the input, results, provisioning blocks and service notes.

>> Anyway, it is not so complicated
>> to handle the queryIDs for the client (see some of the example code of
>> the client at the prototype page). Maybe it is another situation that
>> you are describing than the one in the example? Can you give some
>> examples where it would be necessary to return the queryIDs? Again,  
>> not
>> sure if I understand.
>>
>> http://bioinfo.pcm.uam.es/prototype/
>>     
>
> I'm sure it's not "rocket science" to store the queryIDs on the  
> client, but it's simply not necessary. IMHO it's easier to return the  
> queryIDs to the client as compared to storing them on the client.  
> That's all.
>   

As you say, it is simply a difference of approach. Following either 
approach, it is possible to ask for status and results for jobs. We have 
chosen one way that is implemented in the Perl libraries and (hopefully) 
well specified in the proposal.

> I'm not arguing to add the GetResourcePropertyDocument method right  
> now, but I feel it would be good to keep the option open to do so in  
> the future. 

Nothing stops this from being added in the future with the current proposal.

> If the queryIDs move to the EPR and are returned by a service on  
> invocation, the client can echo back the EPR (containing both the  
> batch ID and the job IDs) and only needs to append one *static*  
> ResourceProperty (either the one to get the status or the one to get  
> the results). Hence the client doesn't have to create anything  
> dynamically, requires less logic and you will only request one  
> resource property at a time, so we even would not need the  
> GetMultipleResourceProperties method anymore. (I assume it doesn't  
> make sense to request the status and the results at the same time.)
>   

Well, no, it would not make much sense for the same job(s). There is 
nothing that stops a client from doing it, though (but asking for not 
existing properties (results) would return an WSRF error, of course).

However, some clients could want to request (in the same message) status 
for one job "q1" (property status_q1) and the result for another job 
"q2" (property result_q2). This is probably not a very common situation 
but a client could choose to only ask for only status/result properties 
or for a mix.

> If a client wants to manipulate the the EPR by stripping out some  
> queryIDs to retrieve a ResourceProperty only for a subset of  
> queryIDs, they *can* do that. This doesn't make life more complex for  
> the service and as far as I understand the specs it is legal. It  
> should not be required to manipulate the EPR and it isn't. If clients  
> lack the logic to request resources for a specific queryID from a  
> batch-job, they can always echo back the whole EPR as  
> ReferenceParameter and get the resource for all the queryIDs of the  
> batch. So moving the queryIDs to the EPR requires less logic on the  
> client side and therefore I think it is a more elegant solution.  

Well, such clients (that lack logic to edit the EPR) would (potentially) 
waste bandwidth or be very limited. If a client can only ask for the 
results of all jobs and the first job is finished after 5 minutes and 
the second job is finished after 10 hours, this client would not be very 
useful (either you have to wait 10 hours to get the result of the 
5-minute job or you retrieve the result of the 5-minute job twice).

In summary, we think that it is not necessary to change the proposal, it 
can do what is needed and we have a working implementation.

Anyway, thank you for your suggestions.

Kind regards,
Johan

-- 
Johan Karlsson
Instituto Nacional de Bioinformática (INB)
Integrated Bioinformatics Node (GNV-5)
Dpto. de Arquitectura de Computadores
Campus Universitario de Teatinos, despacho 2.3.9a
29071 Málaga (Spain) 
+34 95 213 3387