[MOBY-dev] Re: [MOBY] BioMOBY 0.49 API comments

Sat Apr 19 16:47:04 UTC 2003

On Thu, 2003-04-17 at 12:50, Andrew D. Farmer wrote:
> Hi Mark-
> I hope these comments aren't too late to be of any use; most are simple
> proof-reading type stuff, or requests/suggestions for clarification; a few are
> issues that may be worth considering more closely.

Hi Andrew, 

I hope you don't mind that I am responding to the wider list with clips
of your message - I know it is bad nettiquette to pass "private"
conversations on to non-participants :-)  I think in this case you wont
mind...

I've taken many of your comments to heart - as you say, some were
typo's, some were accidental mis-representations, some were purposeful
exclusions, while others were complete oversights!  It was great of you
to spend so much effort going through the API - thank you!!

I've responded to most of your comments directly in the API as changes
to the text (I'm still working on it... you message printed out as 6
pages, so it is taking a while ;-) ), but some of them I will clarify
here as they are points worth broader & possibly urgent discussion.

> Presumably, when a Collection is given as a Primary, with a single simple type
> as its contents, this simply means that the Collection is homogeneous in
> containing only instances of that type, and there are no constraints on
> the size; I think there is some need for example of use of mixed-type
> collection parameters; 

To be honest, I can't think of one... but that doesn't mean that such
things do not exist.  The precise definition of the collection is
fine-tuned during service registration.  The Collection article is
really meant to be interpreted as a "bag", whose content has no
particular structure.  If structure is required, then it is no longer a
collection, and should broken out as named Simple articles.  If you
require 500 individual articles (i.e. 500 **named parameters**, not just
500 objects)... well... too bad :-/   It must be registered as such. If
you say that your collection has multiple datatypes, then you must as
the service provider know what to do with them.  No order, no
cardinality, no names.

> "articleName gives an optional mechanism for named inputs"; does this imply
> that if articleName is not used, then ordering will be used? 

no.  articleName was chosen *in lieu of* cardinality in the message.

> what if some
> are named and some are anonymous?

too bad.... give me an example where some parameters are named and
others are anonymous and I'll show you an example where I can name all
of the parameters ;-)

> If you were looking for services that would take (exactly)
> two sequences as input (perhaps with different "roles"), could you do this?

The two sequences are named Simple inputs.

> expandObjects: "sub-objects" is confusing in context (sounds like subclasses);
> you mean (I think) the notion of contained (has-a) objects- i.e. in
> your example of AnnotatedSequenceObject, services accepting "annotations"
> would also be returned? If this is indeed what you meant here, you may
> also want to think about how ontology traversal will work here, since
> has-a is not necessarily acyclic?

yeah... fun!  I didn't actually intend for HASA to be acyclic... I
wonder how much pain this is going to cause in practice.

> 	I'm not sure I completely understand the discussion of "authoritative
> 	service"; sounds like it's a question of being the primary source or a
> 	mirrored source of data? Could it be made clearer?

I've tried to clarify this in the API text, but it is still nebulous.  I
think the *intention* is clear, but the wording is slippery...

> 	retrieveObject:
> 
> 		don't know what the resolution is on the use of CDATA between you
> 		and Martin, but if you are escaping WSDL with CDATA, will you also
> 		want to escape XSD here?

I am still frustrated by the Perl/Java differences here... or at least,
the apparent differences based on what I expect the behaviour to be at
the Perl end compared to what the behaviour is at the Java end ( and
v.v.) compared to what I interpret as the intended function/behaviour of
the CDATA element...  The details are fuzzy now, as it was a bit of a
whirlwind at the hackathon.  I'm sure Martin, Brian and I will have it
out again at the I3C Hackathon next week...

> 		Are the HASA <objectType> elements for primitives really going to
> 		look like <objectType>STRING</objectType> as suggested by the
> 		table later on in the document? this doesn't seem too helpful...

I don't understand the objection.

> "very simple composition. It consists of three elements"
> 	-composition -> structure ; might be better, as composition suggests (to
> 	me) the aggregation of objects via HASA
> 	-use of "elements" is confusing here, as these items don't correspond to
> 		XML "elements"

I changed it in the text, but I wasn't in fact talking about XML at this
point.  The base object *may be* represented in XML, but that isn't the
only way.

> I guess this is one of the more
> confusing aspects of the MOBY object representation; an analogy that suggests
> itself (which may or may not be the way you are seeing it, but which might help
> clarify) is the resource/representation distinction on the web? That is, the
> URI identifies the "resource" but the "representation" that is returned can
> vary (e.g. different mime-types, or language encodings)?

I think that is exactly how I view things.  I have stolen your words and
updated the text :-)

> use of moby: namespace in this section is inconsistent; are you imagining that
> the XML may have other namespaces? could moby: be indicated as the default
> namespace?

Yes, it is purposely inconsistent since none of the examples include the
XML header.  Since moby: is the default namespace it doesn't necessarily
have to be there.

> there is some inconsistency in the use of moby:elementName vs moby:articleName

If you see these, please show me exactly where as it took a long time
before I settled on "article", so these inconsistencies might appear all
over the place!

> "The value of the Object element is ignored": by "value" you mean element
> "content model"? by whom is it ignored? doesn't
> it really depend on who is interpreting the document? MOBY itself presumably
> ignores the element content of all objects (including primitives)? Should
> non-primitive objects even be allowed to have text content (since mixing this
> with HASAs will produced mixed-content elements)?

badly worded.  I will update that as soon as I can think of a sensible
way to say what I mean :-)

> 	CRIB:
> 		Invocation object: it's not clear to me how these supply additional
> 			information above the association between the queryInput and
> 			the queryResponse? maybe a more detailed example would be helpful
> 			here...

I need to put some things into the "moby by example" section at the
bottom to clarify this.  The behaviour is discussed in detail in the
message that I sent out from the Singapore Hackathon (Subj:  "Spewing
forth from the Hackathon")

> 		it might be useful to be able to supply some information about the
> 		"nature" of the cross referencing between objects (typed relationships).
> 		do you have anything along those lines envisioned?

envisioned, yes, but it came up too late to be sensibly included in this
version of the spec.  Heiko Schoof has done some thinking about this,
and Dave Block and Hilmar Lapp at GNF/Novartis apparently have an
ontology to describe these things... we will steal from them later.

> 	PIB:
> 		does this belong as a child of a MOBY object, or would it be more
> 			appropriate at a higher level (queryResponse or even enclosing
> 			these)? alternatively, if you want to be able to aggregate object
> 			content from multiple providers, would you want to have provision
> 			info enclosing the object content that came from that provider?

No, it belongs as a child of the MOBY object.  Any piece of data has
provision, and even sub-pieces may have different provision than the
parent piece, or even the entire response.

> "RDF triple (subject1, predicate, subject2)": in RDF-speak, subject2 should be
> "object" (which may be a resource or a literal; if a resource, then may be used
> as a subject in other triples, but it isn't the subject of the predicate in
> this particular statement)

yeah, except I don't want to confuse the word "object" in RDF with the
word "Object" in MOBY-speak... so I used the wording above just to be
sure there was no confusion.  I might have added confusion by doing so.

> "More complex hypothetical examples follow..."
> 	-many primitives are lacking id attributes; also, some objects have
> 		empty values, e.g. GO_Annotation namespace="" id=""; is this legal?
> 		what are the consequences?

I think it is legal.  I don't know, yet, what the consequences are.  It
will not be legal if we move to an LSID-type scheme, however.  I know
from experience that sometimes it is just impossible (in my imagination)
to assign a namespace/id to a "thing" that exists only transiently

> 	-some articleNames seem to have namespaces ("go:name") others do not
> 		("Length"); if this is intentional, you may wish to comment on the
> 		difference?

I didn't really intend for that to have the meaning that you are
ascribing to it, though this might be a useful behaviour if we can
define it properly!

> In general, it might be useful to have some exegesis on the intended "meaning"
> or potential interpretation of some of these structures? e.g. GO_Annotation
> standing alone seems nonsensical? 

To be honest, it doesn't bother me at all... 

> Similarly, it might be useful
> to discuss the sense of some of the CRIB references (in particular, why
> they are modeled as CRIBs while the GO associations get "first-class"
> association)...

I'm sorry, I don't understand the objection.  can you put it another
way?

> Comments about Input and Output SOAP messages:
> How does base64 encoding help XML parsers? I'm probably being obtuse, but
> I don't get it...

yeah... this is a problem,at least in Perl.  Without looking up the
SOAP::Lite documentation again, here's how I remember the discussion:
SOAP::Lite uses XML::Parser to deal with the message.  Since we are
passing XML in the payload, the entire payload gets parsed for
validity.  The payload might be enormous if we are passing multiple
query/responses.  Rather than parse the payload twice (once by
SOAP::Lite, and once by your service), it speeds things up considerably
(based on trial and error by Paul Kunchenko) if you base64 encode the
payload when passing XML around.

> "Note that, even if present, the CRIB and PIB blocks of input MOBY Objects
> should be ignored..."
> What justifies this? What if you had a service that want to interpret the
> graph connectivity of objects?

I don't know if these elements are reliable enough to ever base a
service on them... as they are 100% optional.

> General comment on representation of input- have you given any thought
> to representing multiple occurrences of the same input data item in different
> queryInputs? For example, if the same sequence was passed to BLAST, with
> multiple invocations using different parameters (secondary) it would seem
> quite nice not to have to duplicate the sequence text in each queryInput;
> of course, this is _the_ extreme example, but there may be other cases in which
> a "graphier" approach might be beneficial. Another instance that springs to
> mind is a situation in which a service did pairwise operations, and one wanted
> to pass a set of objects, and have each distinct pair analysed, without needing
> to represent the data n(n-1)/2 times?

beyond the scope of this course ;-)  Let's get simple examples working
reliably (if possible) before we deal with such complex problems.

> OUTPUT:
> "There are as many queryResponse elements..." : are responses required
> to be in the same order as the inputs? According to
> "Additional Requirements of Service Responses", no, but I'm not sure these
> mechanisms are sufficient for mapping query and response; for example,
> supposing the same item is passed twice, but with different secondary parameters,
> how would one distinguish which response belonged to which request? It might
> be sensible to add queryRequest/queryResponse identifiers to handle the
> mapping explicitly?? If the rules for associating input with output are
> kept, then will "empty" queryResponses need to at least have an Invocation
> object to properly allow them to be matched up with the queryInput
> that generated them?

Ack!  

I have to stop now, my wife needs a ride to work.

...that last one has thrown me. This has always been an awkward part of
MOBY, and became apparent during the prototype phase, but I thought I
finally had all my bases covered }-P

You're right, it needs a re-think, especially since I also say that
namespace and id can be empty, so these are also not reliable xrefs back
to a particular query... I guess the entire thing can be solved by
returning responses in the same order as they were received, and putting
the burden on the client...??

M