[MOBY-l] MOBY at NCGR/CSHL- intro to ISYS and its conceptual relationship to MOBY
Mark Wilkinson
markw at illuminae.com
Sat Sep 28 13:34:33 UTC 2002
Hi Andrew,
Thanks for the overview of ISYS - as you point out, ISYS provided many
of the "founding principles" of MOBY, so it was great to see these
spelled out clearly on the list!
It's just after 3:00 AM, so I wont risk saying anything intelligent
right now :-) I'm mainly aiming to make some observations about what
you have written and ask questions as best I can in my insomnia... I'm
trying to get a clear picture of the similarities and differences
between ISYS and the (current instantiation of) MOBY, and since I'm not
sure I understand everything, please read all of my statements with an
inflection at the end ;-)
Andrew D. Farmer wrote:
>When ISYS starts, it "discovers" ServiceProviders through
>a simple plug-in strategy; it simply scans through a Components directory in
>which each component provides a basic structure for providing its resources,
>
This is the ISYS analogue of MOBY Central, ja? As I understand it, the
key differences are that there is no registry "thingy" per se, just a
directory. Service Providers are transiently "registered" as they are
started/stopped through their individual GUI's (or whatever), and
"registration" involves putting interface descriptions into that
directory. Is that right?
> gets the information about which classes represent ServiceProviders,
>
as opposed to...?? What else is in that directory? Which "components"
represent ServiceProviders, or which "classes"? (I'm just trying to
clarify the terminology) As I understand it, Components implement
Classes, and one of the Classes that can be implemented is
ServiceProvider... or am I completely up the creek?
>Services in ISYS:
>
>Services and Service brokering in ISYS come in two distinct flavors:
>"static" and "dynamic".
>
>The former is the more traditional approach by which specific well-structured
>interfaces are defined (e.g. a "RetrieveSequence" service with specified
>input types (Identifier) and output types (SequenceText));
>
so currently you would define MOBY services as being "static"... and
yet... not quite. This scenario seems to differ from MOBY in one
critical way - that our interfaces are "modularly" defined, rather than
being well-structured: A "RetrieveSequence" service type would not
exist in MOBY. Rather we would have a "Retrieve" service that had an
output type of "Sequence", and an ~arbitrary input type, depending on
what the service provider needed. .When a service is required, its
interface is dynamically discovered (WSDL), but we avoid the wild and
wholly world of data types by defining the data formats and requiring
that one or more of these be passed... so a Client (or component)
doesn't have to be specifically designed for that interface.
>in this case, a component that wants to use that specific functionality
>will have been designed with knowledge of the service interface, and will
>simply ask ISYS to provide it with an implementation
>
This is also not strictly true in MOBY. For us, the "component" ( which
I think is a "Client" in MOBY speak) does not need to be designed with
knowledge of the service interface per se. The interface -
input/output/URL - is dynamically discovered using WSDL, and in fact can
be completely ambivalent about what type of service it is dealing
with... i.e. a client doesn't have to know what to *do* with a Sequence
in order to happily retrieve it and pass it on to the next service... so
service implementation is not a problem. *Representation* of data, on
the other hand, will have to be coded for each data type (or at least,
each of the basic parent data types)
> (either a user
>specifiable default, or a list of all known implementations) of that
>service. These service types are specified as interfaces (along with
>specifications of datatype interfaces for their input and output types)
>in a special package that acts as a kind of catalog of static
>services; in this way, it is somewhat analogous to a certain way of looking
>at the notion of a service "ontology" at a central MOBY registry
>
this is actually much closer to the myGrid service ontology definition,
than it is to the current MOBY view of a service ontology, as I said
above... I see the power of both approaches, and I still haven't come
to a conclusion about which is better. I think, that our approach is
more flexible w.r.t. client/service design, but we suffer badly from
having little or no machine-readable logic. We rely on a human to look
at the input and output types, the name/description of the service
type,and decide for themselves what exact transformation is happening in
between. e.g. if you give me a PubMed ID, and I offer to give you back
Sequence objects, and the service type is "Retrieve"... what does that
mean?? It could mean "Give me the sequences that were published in this
manuscript", or it could mean "give me all of the sequences that were
published by the author of this manuscript". In this sense, the current
MOBY situation is a total nightmare and needs some serious tightening
up!!!! We can't make the semantic bioweb without some additional
ontology(ies) sitting around somewhere to more clearly define our
services, rather than the current human-readable descriptions. At the
same time, access to these orthoganal relationships between data
(PubMed/ID --> Sequence) are, in my mind, one of the most wonderful
aspects of the MOBY system and I am loathe to lose them altogether...
>We haven't actually used this mechanism of service specification
>that heavily in ISYS, having found the "dynamic service" paradigm rather
>more powerful in the context of the ISYS client-orientation.
>
okay, here we go :-) Let's figure out if this additional power can
work in the MOBY System - it sounds like we are already sitting
somewhere between your "static" and "dynamic" models in any case...
>components may provide implementations. There is no conceptual reason
>why it couldn't be a deeper hierarchy, we just never found it to be useful
>to abstract things out to higher levels.
>
To be honest, we haven't really thought clearly about a service
"hierarchy". We started to define basic service types ("Retrieve",
"Blast", "Alignment"), and they form a loose hierarchy ("Blast" might be
a child of "Alignment"), but there is no way to sensibly name a service
like the PubMed->Sequence that I suggest above except to call it a
"Retrieve"... which isn't very useful. So, I think we have found the
same problem as you have... as soon as you try to go beyond basic
service types the numbers become overwhelming, in particular when we
think of all of the orthoganal slices we could make through the data...
>Dynamic services differ from static services in several important ways.
>First, the interface for dynamic services is very generic, and totally
>encapsulates the "semantics" of the service. There is a distinction between
>"DynamicDataService" and "DynamicViewerService"- the former returns data,
>the latter provides a visualization (i.e. a "Client")
>
A MOBY client would, presumably, fit as one of your DynamicDataServices?
>, but other than that,
>dynamic services are almost totally opaque to the system in terms of
>what exactly they are doing (although they are required to provide a
>descriptive String so that the user gets a sense for what it is he/she is
>invoking).
>
so we have the same problem :-)
>Second, the inputs and outputs of a dynamic service are similarly
>opaque to the system;
>
like MOBY.
>When this occurs, ISYS simply passes around
>references to the data set to each of the registered ServiceProviders, and
>asks them to inspect the data and return the set of dynamic services that they
>provide that could be used on the dataset. This inspection can be as simple
>as looking for data of a certain type (e.g. identifiers in a certain namespace)
>or more complicated (e.g. looking at the lengths of the sequences provided, or
>the value of a species attribute). The main point is that the "service matching"
>is totally encapsulated in the ServiceProvider, and does not depend on some
>third party "matchmaker" like UDDI.
>
This is interesting, and very different from MOBY. It's much more "P2P"
than we are, and it does give you some abilities that we don't have.
e.g. we pass around only the name of the object when looking for
services, so service providers can't "inspect" the object until they
have already been selected. This can lead to hiccups such as the one
that Lincoln raised at my BOSC presentation where a service may say it
can use an object, receive the object as input to a service transaction,
and then discover that it can't really use it at all... This isn't a
*critical* problem, but it is, as I say, a hiccup that a Client needs to
be aware of. In your system, presumably, this cannot happen.
> Of course, the fact that we're only passing
>around references to objects in memory makes this much easier than doing a
>similar trick on the network, but one can imagine an analogous mechanism for
>a MOBY-like system; for example, if a MOBY client simply sent out some simple
>representation of what data types were present in its input set, that would
>probably be sufficient for most providers to do a reasonable job of presenting
>their relevant services.
>
indeed... we'd have to pass around the base MOBY-Triple at least
(instance/namespace/id). The service provider would then know what type
of data would be contained in the object (from the instance), what
namespace it falls into, and moreover, what ID it has. This latter
point is the one that we are currently missing - the fact that it would
be ridiculous for a service provider to register each ID number that it
knows about in the Registry to ensure that it is never passed something
it can't deal with. Currently, service providers register only object
type (instance) and namespace (optional), and if they get something sent
to them that matches those criterion, then buyer beware!
I wonder, though, if the overhead of passing larger objects, multiple
times from P2P, and having the service inspect these each time, is worth
the pain for the gain? I guess it isn't so much larger a message than
other P2P broadcasts (so long as we broadcast only the triple, and not
the payload), but its still more network traffic than we currently have.
>(Note that there may be some fuzzy ground here between
>the notions of a "type" and a "value"; for example, if one uses the LSID
>structure (as I understand it), the "namespace" is a property of the "value"
>instead of the "type" of that data, but would probably be critical in matching
>retrieval services to the data;
>
in MOBY we consider namespace a data type, rather than a value.
>another example would be a "sequence", for
>which the "alphabet" used by the sequence could be encoded into a subtype or
>simply viewed as a property of the sequence "value".)
>
this is exactly the level of complexity that I was hoping to avoid at
the registry (discovery) level.
>Though I certainly agree that
>having services that are more self-descriptive will be valuable, I don't think
>we should rule out the possibility of exploring alternative approaches to
>the service brokering. For example, one could imagine "MOBY Central" as
>being nothing more than a registry of distributed "ServiceProviders"
>(and probably the registry of the "ontologies/vocabularies" of data types
>and service types or service descriptors).
>
So in this way it breaks the P2P paradigm in that you *must* connect to
MOBY Central first in order to discover service providers, rather than
discovering service providers through broadcasting over the P2P network?
... if so... what have we gained (other than the ability of the service
provider to inspect the data... which is in itself significant!)
>(Note that I'm deliberately trying to paint this picture without
>using SOAP/WSDL for the time being, although one could imagine using those
>as well....)
>
sure
>We can explore the pros and cons of the various approaches in subsequent
>discussions, but I hope this helps get people thinking in different
>ways about the problem.
>
let's start exploring them now :-)
Although I see the power gained by a more P2P architecture, I think a
couple of things (important things) are lost by going this route:
1) Simplicity of service provision
2) Semi/fully automated workflow discovery (finding a path from an
input to an output data type through multiple services)
The latter can probably be accomplished using the brokering approach you
describe, but it isn't as straightforward and (as best I can imagine in
my current state) would have to be accomplished by possibly endless
trial-and-error traversals of many dynamically discovered service paths.
The former point however, the one of simplicity, might be more important
at the end of the day as this will affect the acceptance/adoption of the
system by the people whom we need to make the whole thing work...
> One thing I would like to point out, however, is
>that the different approaches to service representation/service discovery
>are not necessarily mutually exclusive. For example, I have often found it
>useful in ISYS to define a "static service", but to allow that same service
>to be provided dynamically, by simply writing a little code that does a
>reasonable translation from the "self-descriptive" representation of the
>data to the representation prescribed by my own implementation. I'm beginning
>to wonder if a more "self-descriptive" and finer-grained approach to
>service-typing than the "fat interfaces with signatures" model might be useful
>to bridge these alternative approaches (possibly similar to WSDL, but I need
>to look at that more);
>
I'd like to pursue this idea further, as I'm not sure I am understanding
what you suggest as the "middle ground" here. Please expand on this...
>I
>think it is worth thinking about different models of self-documentation that
>might be more flexible and useful in a decentralized environment than those
>provided by interface signature definition.
>
Okay myGrid folks!! Jump in here!!! :-)
> They may be related to one another via inherirtance (e.g.
>SequenceText extends "LinearObject", which is merely an abstraction of a
>thing that has "length");
>
I have been thinking about this type of relationship as well... I think
we could do with having a third ontology looking after "representable
as" definitions (I wouldn't be thrilled to put them in the same ontology
as the data types themselves). In this way, client programs could be
designed in a more generic way.
e.g. there could be a viewer for a "LinearObject" datatype, where
"Sequence" data, and "BlastHit" data are both viewable as LinearObjects.
...this is something that has been kicking around in the back of my head
and isn't well thought out yet since it is far from being the most
critical issue at the moment... so if that makes no sense please ignore
it :-)
>At the next level up from this is the somewhat infamous (around here, anyway)
>IsysObject, which simply wraps an arbitrary set of IsysAttributes to indicate
>a level of "objective coherency", and provides querying mechanisms for
>getting at attribute content of interest. I should point out that what we
>were basically trying to achieve here was a somewhat more flexible way of
>representing multiple inheritance of IsysAttribute interfaces than requiring
>developers to statically define a class that would implement a specific set
>of these IsysAttribute types, so in some sense, it's not really all that
>much weirder than the notion of multiple interface inheritance.
>
right... we allow for (but have never yet used) an object that is
constructed in this way... although we still require that the objects
schema be registered as a new datatype, even if it is a composite
object! So we are a bit tighter here...
>("Premature optimization is the root of all evil"- Knuth)
>
love it!
>The main questions I think we need to explore
>have to do with the nature of the "data type ontology" that will be serving
>as a common language. My own feeling is that it is much more
>important (and more feasible) to develop a fine-grained vocabulary along the
>lines of the IsysAttribute catalog than to try to get agreement on the correct
>structure of higher-level objects.
>
You haven't yet convinced me of this, but I need to spend more time
reading about your IsysAttributes :-) Arbitrary collections of
"thingys", even if we have a fine-grained list of valid "thingys", seem
to be much less robust and prone to misinterpretation than a properly
defined object... and since our objects are minimlalist as it is, this
does not seem less feasible than the approach you are suggesting... In
addition, we gain some assurance about the composition of an object in a
backwards-compatible way by examining its parentage.
>This seems to be somewhat at odds with the
>picture that is presented in the moby_classes.txt file (available from the
>biomoby cvs tree);
>
I should have removed that from the CVS - please ignore it as it does
not contain any "valid" data.
> I should do something similar
>for the IsysAttribute hierarchy as it currently stands, but I want to clean it
>up a bit, as I think it contains some needless complexity and certainly many
>artifacts). I also think we should give consideration to the
>"dynamic/compositional" style of growing more complex data types
>(which is more natural to XML than it is to Class definitions).
>
errrm.... I understood our object models to be exactly that - dynamic
in their composition... so long as new complex data types are
registered... Am I misunderstanding you?
>At any rate, I've already violated my intention not to overwhelm
>your attention span. Please let me know if you have any questions, comments,
>etc. Hope it helps...
>
It is now 8:15!! I managed to spend 5 1/4 hours writing this response
(interrupted by cat and wife, both wondering why I was still on the
computer at this time of night), and I am now well and truly knackered!
But it was great that you went into so much detail as it forced me to
go out and do some additional reading in order to understand why you
designed ISYS the way you did, and why you espouse (or at least, keep
an open mind about) this alternate approach to service description and
discovery. I'm certainly open minded about many of the ideas you bring
up, though not entirely sold on them yet ;-)
It's time we had another MOBY DIC meeting! Emma Lake will be frozen
over soon, so we should probably think about meeting elsewhere. I
should soon have a limited travel budget as our MOBY funding will start
to flow in a few weeks. Lukas, are you still interested in hosting the
meeting at Carnegie? (no pressure! I'm just asking because you brought
it up as a possibility last time we met...)
In any case, we can continue to discuss these things online for the
moment. I'd particularly like to get the input of the myGrid people, as
some of the issues raised here seem to fall right into their lap w.r.t.
plans and architecture.
It's nice to be discussing MOBY in the absence of underlying
implementation issues such as UDDI/SOAP - I think a discussion of the
bahaviour of the system is badly needed right now before we get
ourselves locked into something we might regret later...
I'm going to bed!
good night all,
M
More information about the moby-l
mailing list