[MOBY-dev] Re: [MOBY] BioMOBY 0.49 API comments

Mon Apr 21 21:23:12 UTC 2003

> I hope you don't mind that I am responding to the wider list with clips
> of your message - I know it is bad nettiquette to pass "private"
> conversations on to non-participants :-)  I think in this case you wont
> mind...

Sure, I just didn't want to inundate the list with 6 pages of
mostly nits...

> > Presumably, when a Collection is given as a Primary, with a single simple type
> > as its contents, this simply means that the Collection is homogeneous in
> > containing only instances of that type, and there are no constraints on
> > the size; I think there is some need for example of use of mixed-type
> > collection parameters;
>
> To be honest, I can't think of one... but that doesn't mean that such
> things do not exist.  The precise definition of the collection is
> fine-tuned during service registration.  The Collection article is
> really meant to be interpreted as a "bag", whose content has no
> particular structure.  If structure is required, then it is no longer a
> collection, and should broken out as named Simple articles.  If you
> require 500 individual articles (i.e. 500 **named parameters**, not just
> 500 objects)... well... too bad :-/   It must be registered as such. If
> you say that your collection has multiple datatypes, then you must as
> the service provider know what to do with them.  No order, no
> cardinality, no names.

I just wanted to make sure that it I understood what such a construct
means and how it will be interpreted. The lack of internal structure makes
perfect sense, but I think what I was trying to ask was whether the
multiplicity of types in the "signature" of the service registration info
meant that the collection could contain a mixed set including all of those types
or whether the collection had to be homogeneous with respect to one of those
types. In DTD, it would be the difference between (type1|type2)* and
(type1*|type2*).
For example, a multiple sequence alignment or "formatdb" blast
database construction service could conceivably take a collection of either
nuc or protein sequences, but not a mixed set; on the other hand, a molecular
weight determining service might allow mixed collections.

Intuitively, it seems as though the former cases might best be handled as
two separate services (one for nuc collections, one for protein). The
latter case might be done by using a supertype rather than an explicit
enumeration of types (although this might presuppose that no one else
could later extend the supertype with additional subtypes)?

In any case, I think if you feel the added complexity is justified, then the
spec could be clearer on how they should be used, in particular how the service
matching functions will handle them; e.g. I suppose that if I ask for
services that take collections of type X, and there is a service registered
with an input collection of types X,Y,Z then this would be returned as a match
(assuming the intended behavior will be equivalent to that expected if there
were a supertype that encompassed X Y and Z).

>
> > "articleName gives an optional mechanism for named inputs"; does this imply
> > that if articleName is not used, then ordering will be used?
>
> no.  articleName was chosen *in lieu of* cardinality in the message.

But if articleName is only optional, then in cases where it is not used,
presumably it should be guaranteed that order will be preserved. Since you
are using XML for serialization, it shouldn't be a problem to guarantee
the preservation of order, in any case...

>
>
> > what if some
> > are named and some are anonymous?
>
> too bad.... give me an example where some parameters are named and
> others are anonymous and I'll show you an example where I can name all
> of the parameters ;-)

I think all I was suggesting is to have the spec prohibit such constructs.
At any rate, I see your point. I would tend to think that since:
	1) named params can be used in all cases(?)
	2) naming params makes them more intelligible
	3) ordering is more brittle wrt extension/alteration
	4) less is more

perhaps the spec should just require names for inputs/outputs and not
mess with anonymous, ordered articles?

The only value that anonymous params seems to give is brevity of expression
(are there other benefits?),
but that doesn't really seem like a very compelling reason in this context.

>
>
>
> > If you were looking for services that would take (exactly)
> > two sequences as input (perhaps with different "roles"), could you do this?
>
> The two sequences are named Simple inputs.

I think the context of the question was not with respect to registration
of such a service, but querying for them; i.e., if I send a Service Query
Object with two <Sequence/> elements in the <Input/> block, does this
duplicity get ignored or interpreted as specifying services that take
two (or more?) input sequences? I wonder how expressive our queries need
to be?

>
>
> > expandObjects: "sub-objects" is confusing in context (sounds like subclasses);
> > you mean (I think) the notion of contained (has-a) objects- i.e. in
> > your example of AnnotatedSequenceObject, services accepting "annotations"
> > would also be returned? If this is indeed what you meant here, you may
> > also want to think about how ontology traversal will work here, since
> > has-a is not necessarily acyclic?
>
> yeah... fun!  I didn't actually intend for HASA to be acyclic... I
> wonder how much pain this is going to cause in practice.

Yes, it's hard to say at this point, but it's something to keep in mind...

>
>
>
> > 	I'm not sure I completely understand the discussion of "authoritative
> > 	service"; sounds like it's a question of being the primary source or a
> > 	mirrored source of data? Could it be made clearer?
>
> I've tried to clarify this in the API text, but it is still nebulous.  I
> think the *intention* is clear, but the wording is slippery...

I'm not sure I understand the intention of the updated version either;
some examples might be really helpful.

>
>
>
> > 		Are the HASA <objectType> elements for primitives really going to
> > 		look like <objectType>STRING</objectType> as suggested by the
> > 		table later on in the document? this doesn't seem too helpful...
>
> I don't understand the objection.

Just that saying "Sequence HASA String and HASA Int" seems less
perspicuous than "Sequence HASA SequenceString and HASA Length"...
(even worse is "GO_Term HASA String" vs. "GO_Term HASA Name and HASA Definition,
etc.)

>
> > there is some inconsistency in the use of moby:elementName vs moby:articleName
>
> If you see these, please show me exactly where as it took a long time
> before I settled on "article", so these inconsistencies might appear all
> over the place!

just search for elementName- I counted 5 instances at last check

> > 	CRIB:
> > 		Invocation object: it's not clear to me how these supply additional
> > 			information above the association between the queryInput and
> > 			the queryResponse? maybe a more detailed example would be helpful
> > 			here...
>
> I need to put some things into the "moby by example" section at the
> bottom to clarify this.  The behaviour is discussed in detail in the
> message that I sent out from the Singapore Hackathon (Subj:  "Spewing
> forth from the Hackathon")
>
>
> > 		it might be useful to be able to supply some information about the
> > 		"nature" of the cross referencing between objects (typed relationships).
> > 		do you have anything along those lines envisioned?
>
> envisioned, yes, but it came up too late to be sensibly included in this
> version of the spec.  Heiko Schoof has done some thinking about this,
> and Dave Block and Hilmar Lapp at GNF/Novartis apparently have an
> ontology to describe these things... we will steal from them later.

Right. I think Suzi Lewis also mentioned a relationship ontology from
her group, but I haven't seen it in the GOBO stuff...

However, as discussed below, it's not clear (to me) how typed CRIB
relationships would differ in essence from other HASA relationships
specifically included in the ontology; i.e. what's special about CRIB
references that makes them different in kind from the sort of relationship
specified between a GO_Term and AnnotatedSequence via GO_Annotation?

>
>
> > 	PIB:
> > 		does this belong as a child of a MOBY object, or would it be more
> > 			appropriate at a higher level (queryResponse or even enclosing
> > 			these)? alternatively, if you want to be able to aggregate object
> > 			content from multiple providers, would you want to have provision
> > 			info enclosing the object content that came from that provider?
>
> No, it belongs as a child of the MOBY object.  Any piece of data has
> provision, and even sub-pieces may have different provision than the
> parent piece, or even the entire response.

Can you elaborate on this idea? At some level, to the client all of the
info in a response must be attributed to the responder; they may claim o
have gotten it from someone else, but provenance seems a little like
a chinese puzzle; at any rate, the semantic web

> > "More complex hypothetical examples follow..."
> > 	-many primitives are lacking id attributes; also, some objects have
> > 		empty values, e.g. GO_Annotation namespace="" id=""; is this legal?
> > 		what are the consequences?
>
> I think it is legal.  I don't know, yet, what the consequences are.  It
> will not be legal if we move to an LSID-type scheme, however.  I know
> from experience that sometimes it is just impossible (in my imagination)
> to assign a namespace/id to a "thing" that exists only transiently

>
>
> > 	-some articleNames seem to have namespaces ("go:name") others do not
> > 		("Length"); if this is intentional, you may wish to comment on the
> > 		difference?
>
> I didn't really intend for that to have the meaning that you are
> ascribing to it, though this might be a useful behaviour if we can
> define it properly!
>
>
> > In general, it might be useful to have some exegesis on the intended "meaning"
> > or potential interpretation of some of these structures? e.g. GO_Annotation
> > standing alone seems nonsensical?
>
> To be honest, it doesn't bother me at all...

What does it annotate? Seems like an "association" with no gene product...

>
>
> > Similarly, it might be useful
> > to discuss the sense of some of the CRIB references (in particular, why
> > they are modeled as CRIBs while the GO associations get "first-class"
> > association)...
>
> I'm sorry, I don't understand the objection.  can you put it another
> way?

I think I was just wondering if there is a reason to represent some object/object
relationships in the CRIB block, while others (GO_Term/Sequence) get their own
"named class" (GO_Annotation). Maybe I'm misunderstanding, but

>
>
>
> > Comments about Input and Output SOAP messages:
> > How does base64 encoding help XML parsers? I'm probably being obtuse, but
> > I don't get it...
>
> yeah... this is a problem,at least in Perl.  Without looking up the
> SOAP::Lite documentation again, here's how I remember the discussion:
> SOAP::Lite uses XML::Parser to deal with the message.  Since we are
> passing XML in the payload, the entire payload gets parsed for
> validity.  The payload might be enormous if we are passing multiple
> query/responses.  Rather than parse the payload twice (once by
> SOAP::Lite, and once by your service), it speeds things up considerably
> (based on trial and error by Paul Kunchenko) if you base64 encode the
> payload when passing XML around.

This sounds kind of nasty. You'd think that since WSDL/SOAP is supposed
to support document style messaging, this sort of thing would be more
transparent....

>
>
> > "Note that, even if present, the CRIB and PIB blocks of input MOBY Objects
> > should be ignored..."
> > What justifies this? What if you had a service that want to interpret the
> > graph connectivity of objects?
>
> I don't know if these elements are reliable enough to ever base a
> service on them... as they are 100% optional.

It does seem as though the question of what the client can/should strip out
of the input prior to service invocation is perhaps broader than
CRIB/PIB blocks- for example, if I have an AnnotatedSequence, and the service
I will invoke is actually operating on the GO_Terms and not the SequenceString,
it would be really good for everyone if I did not include the SequenceString
in the request.

>
>
> > OUTPUT:
> > "There are as many queryResponse elements..." : are responses required
> > to be in the same order as the inputs? According to
> > "Additional Requirements of Service Responses", no, but I'm not sure these
> > mechanisms are sufficient for mapping query and response; for example,
> > supposing the same item is passed twice, but with different secondary parameters,
> > how would one distinguish which response belonged to which request? It might
> > be sensible to add queryRequest/queryResponse identifiers to handle the
> > mapping explicitly?? If the rules for associating input with output are
> > kept, then will "empty" queryResponses need to at least have an Invocation
> > object to properly allow them to be matched up with the queryInput
> > that generated them?
>
> Ack!
>
> I have to stop now, my wife needs a ride to work.
>
> ...that last one has thrown me. This has always been an awkward part of
> MOBY, and became apparent during the prototype phase, but I thought I
> finally had all my bases covered }-P
>
> You're right, it needs a re-think, especially since I also say that
> namespace and id can be empty, so these are also not reliable xrefs back
> to a particular query... I guess the entire thing can be solved by
> returning responses in the same order as they were received, and putting
> the burden on the client...??

My instinct is to add identifiers to queryInput elements (client's problem
to generate these) and having queryResponses reference these.
Relying on order would seem to be problematic
in terms of supporting parallelism in the processing of query requests?
I think that a simple id referencing scheme between request and
response might allow you to do away with the Invocation object? Is there
an obvious reason why this simple approach wouldn't work?

Thanks for taking the time to read through...

Andrew

PS. I haven't heard anything further about the hackathon arrangements. Who
is the contact person?