[MOBY-l] MOBY at NCGR/CSHL- intro to ISYS and its conceptual relationship to MOBY

Sat Sep 28 13:34:33 UTC 2002

Hi Andrew,

Thanks for the overview of ISYS - as you point out, ISYS provided many 
of the "founding principles" of MOBY, so it was great to see these 
spelled out clearly on the list!

It's just after 3:00 AM, so I wont risk saying anything intelligent 
right now :-)  I'm mainly aiming to make some observations about what 
you have written and ask questions as best I can in my insomnia... I'm 
trying to get a clear picture of the similarities and differences 
between ISYS and the (current instantiation of) MOBY, and since I'm not 
sure I understand everything, please read all of my statements with an 
inflection at the end ;-)

Andrew D. Farmer wrote:

>When ISYS starts, it "discovers" ServiceProviders through
>a simple plug-in strategy; it simply scans through a Components directory in
>which each component provides a basic structure for providing its resources,
>
This is the ISYS analogue of MOBY Central, ja?  As I understand it, the 
key differences are that there is no registry "thingy" per se, just a 
directory.  Service Providers are transiently "registered" as they are 
started/stopped through their individual GUI's (or whatever), and 
"registration" involves putting interface descriptions into that 
directory.  Is that right?

> gets the information about which classes represent ServiceProviders,
>
as opposed to...??  What else is in that directory?  Which "components" 
represent ServiceProviders,  or which "classes"?  (I'm just trying to 
clarify the terminology)  As I understand it, Components implement 
Classes, and one of the Classes that can be implemented is 
ServiceProvider... or am I completely up the creek?

>Services in ISYS:
>
>Services and Service brokering in ISYS come in two distinct flavors:
>"static" and "dynamic".
>
>The former is the more traditional approach by which specific well-structured
>interfaces are defined (e.g.  a "RetrieveSequence" service with specified
>input types (Identifier) and output types (SequenceText));
>
so currently you would define MOBY services as being "static"... and 
yet... not quite.  This scenario seems to differ from MOBY in one 
critical way - that our interfaces are "modularly" defined, rather than 
being well-structured:  A "RetrieveSequence" service type would not 
exist in MOBY.  Rather we would have a "Retrieve" service that had an 
output type of "Sequence", and an ~arbitrary input type, depending on 
what the service provider needed. .When a service is required, its 
interface is dynamically discovered (WSDL), but we avoid the wild and 
wholly world of data types by defining the data formats and requiring 
that one or more of these be passed... so a Client (or component) 
doesn't have to be specifically designed for that interface.

>in this case, a component that wants to use that specific functionality
>will have been designed with knowledge of the service interface, and will
>simply ask ISYS to provide it with an implementation
>
This is also not strictly true in MOBY.  For us, the "component" ( which 
I think is a "Client" in MOBY speak) does not need to be designed with 
knowledge of the service interface per se.  The interface - 
input/output/URL - is dynamically discovered using WSDL, and in fact can 
be completely ambivalent about what type of service it is dealing 
with... i.e. a client doesn't have to know what to *do* with a Sequence 
in order to happily retrieve it and pass it on to the next service... so 
service implementation is not a problem.  *Representation* of data, on 
the other hand, will have to be coded for each data type (or at least, 
each of the basic parent data types)

> (either a user
>specifiable default, or a list of all known implementations) of that
>service. These service types are specified as interfaces (along with
>specifications of datatype interfaces for their input and output types)
>in a special package that acts as a kind of catalog of static
>services; in this way, it is somewhat analogous to a certain way of looking
>at the notion of a service "ontology" at a central MOBY registry 
>
this is actually much closer to the myGrid service ontology definition, 
than it is to the current MOBY view of a service ontology, as I said 
above...  I see the power of both approaches, and I still haven't come 
to a conclusion about which is better.  I think, that our approach is 
more flexible w.r.t. client/service design, but we suffer badly from 
having little or no machine-readable logic.  We rely on a human to look 
at the input and output types, the name/description of the service 
type,and decide for themselves what exact transformation is happening in 
between.  e.g.  if you give me a PubMed ID, and I offer to give you back 
Sequence objects, and the service type is "Retrieve"... what does that 
mean??  It could mean "Give me the sequences that were published in this 
manuscript", or it could mean "give me all of the sequences that were 
published by the author of this manuscript".  In this sense, the current 
MOBY situation is a total nightmare and needs some serious tightening 
up!!!!  We can't make the semantic bioweb without some additional 
ontology(ies) sitting around somewhere to more clearly define our 
services, rather than the current human-readable descriptions.  At the 
same time, access to these orthoganal relationships between data 
(PubMed/ID --> Sequence) are, in my mind, one of the most wonderful 
aspects of the MOBY system and I am loathe to lose them altogether...

>We haven't actually used this mechanism of service specification
>that heavily in ISYS, having found the "dynamic service" paradigm rather
>more powerful in the context of the ISYS client-orientation. 
>
okay, here we go :-)   Let's figure out if this additional power can 
work in the MOBY System - it sounds like we are already sitting 
somewhere between your "static" and "dynamic" models in any case...

>components may provide implementations. There is no conceptual reason
>why it couldn't be a deeper hierarchy, we just never found it to be useful
>to abstract things out to higher levels.
>
To be honest, we haven't really thought clearly about a service 
"hierarchy".  We started to define basic service types ("Retrieve", 
"Blast", "Alignment"), and they form a loose hierarchy ("Blast" might be 
a child of "Alignment"), but there is no way to sensibly name a service 
like the PubMed->Sequence that I suggest above except to call it a 
"Retrieve"... which isn't very useful.  So, I think we have found the 
same problem as you have...  as soon as you try to go beyond basic 
service types the numbers become overwhelming, in particular when we 
think of all of the orthoganal slices we could make through the data...

>Dynamic services differ from static services in several important ways.
>First, the interface for dynamic services is very generic, and totally
>encapsulates the "semantics" of the service. There is a distinction between
>"DynamicDataService" and "DynamicViewerService"- the former returns data,
>the latter provides a visualization (i.e. a "Client")
>
A MOBY client would, presumably, fit as one of your DynamicDataServices?

>, but other than that,
>dynamic services are almost totally opaque to the system in terms of
>what exactly they are doing (although they are required to provide a
>descriptive String so that the user gets a sense for what it is he/she is
>invoking).
>
so we have the same problem :-)

>Second, the inputs and outputs of a dynamic service are similarly
>opaque to the system; 
>
like MOBY.

>When this occurs, ISYS simply passes around
>references to the data set to each of the registered ServiceProviders, and
>asks them to inspect the data and return the set of dynamic services that they
>provide that could be used on the dataset. This inspection can be as simple
>as looking for data of a certain type (e.g. identifiers in a certain namespace)
>or more complicated (e.g. looking at the lengths of the sequences provided, or
>the value of a species attribute). The main point is that the "service matching"
>is totally encapsulated in the ServiceProvider, and does not depend on some
>third party "matchmaker" like UDDI.
>
This is interesting, and very different from MOBY.  It's much more "P2P" 
than we are, and it does give you some abilities that we don't have. 
 e.g. we pass around only the name of the object when looking for 
services, so service providers can't "inspect" the object until they 
have already been selected.  This can lead to hiccups such as the one 
that Lincoln raised at my BOSC presentation where a service may say it 
can use an object, receive the object as input to a service transaction, 
and then discover that it can't really use it at all...  This isn't a 
*critical* problem, but it is, as I say, a hiccup that a Client needs to 
be aware of.  In your system, presumably, this cannot happen.  

> Of course, the fact that we're only passing
>around references to objects in memory makes this much easier than doing a
>similar trick on the network, but one can imagine an analogous mechanism for
>a MOBY-like system; for example, if a MOBY client simply sent out some simple
>representation of what data types were present in its input set, that would
>probably be sufficient for most providers to do a reasonable job of presenting
>their relevant services. 
>
indeed...  we'd have to pass around the base MOBY-Triple at least 
(instance/namespace/id).  The service provider would then know what type 
of data would be contained in the object (from the instance), what 
namespace it falls into, and moreover, what ID it has.  This latter 
point is the one that we are currently missing - the fact that it would 
be ridiculous for a service provider to register each ID number that it 
knows about in the Registry to ensure that it is never passed something 
it can't deal with.  Currently, service providers register only object 
type (instance) and namespace (optional), and if they get something sent 
to them that matches those criterion, then buyer beware!

I wonder, though, if the overhead of passing larger objects, multiple 
times from P2P, and having the service inspect these each time, is worth 
the pain for the gain?  I guess it isn't so much larger a message than 
other P2P broadcasts (so long as we broadcast only the triple, and not 
the payload), but its still more network traffic than we currently have.  

>(Note that there may be some fuzzy ground here between
>the notions of a "type" and a "value"; for example, if one uses the LSID
>structure (as I understand it), the "namespace" is a property of the "value"
>instead of the "type" of that data, but would probably be critical in matching
>retrieval services to the data; 
>
in MOBY we consider namespace a data type, rather than a value.

>another example would be a "sequence", for
>which the "alphabet" used by the sequence could be encoded into a subtype or
>simply viewed as a property of the sequence "value".)
>
this is exactly the level of complexity that I was hoping to avoid at 
the registry (discovery) level.

>Though I certainly agree that
>having services that are more self-descriptive will be valuable, I don't think
>we should rule out the possibility of exploring alternative approaches to
>the service brokering. For example, one could imagine "MOBY Central" as
>being nothing more than a registry of distributed "ServiceProviders"
>(and probably the registry of the "ontologies/vocabularies" of data types
>and service types or service descriptors). 
>
So in this way it breaks the P2P paradigm in that you *must* connect to 
MOBY Central first in order to discover service providers, rather than 
discovering service providers through broadcasting over the P2P network?

... if so... what have we gained (other than the ability of the service 
provider to inspect the data... which is in itself significant!)

>(Note that I'm deliberately trying to paint this picture without
>using SOAP/WSDL for the time being, although one could imagine using those
>as well....)
>
sure

>We can explore the pros and cons of the various approaches in subsequent
>discussions, but I hope this helps get people thinking in different
>ways about the problem.
>
let's start exploring them now :-)  

Although I see the power gained by a more P2P architecture, I think a 
couple of things (important things) are lost by going this route:

1)  Simplicity of service provision
2)  Semi/fully automated workflow discovery (finding a path from an 
input to an output data type through multiple services)

The latter can probably be accomplished using the brokering approach you 
describe, but it isn't as straightforward and (as best I can imagine in 
my current state) would have to be accomplished by possibly endless 
trial-and-error traversals of many dynamically discovered service paths.  

The former point however, the one of simplicity, might be more important 
at the end of the day as this will affect the acceptance/adoption of the 
system by the people whom we need to make the whole thing work...

> One thing I would like to point out, however, is
>that the different approaches to service representation/service discovery
>are not necessarily mutually exclusive. For example, I have often found it
>useful in ISYS to define a "static service", but to allow that same service
>to be provided dynamically, by simply writing a little code that does a
>reasonable translation from the "self-descriptive" representation of the
>data to the representation prescribed by my own implementation. I'm beginning
>to wonder if a more "self-descriptive" and finer-grained approach to
>service-typing than the "fat interfaces with signatures" model might be useful
>to bridge these alternative approaches (possibly similar to WSDL, but I need
>to look at that more);
>
I'd like to pursue this idea further, as I'm not sure I am understanding 
what you suggest as the "middle ground" here.  Please expand on this...

>I
>think it is worth thinking about different models of self-documentation that
>might be more flexible and useful in a decentralized environment than those
>provided by interface signature definition.
>
Okay  myGrid folks!!  Jump in here!!!  :-)

>  They may be related to one another via inherirtance (e.g.
>SequenceText extends "LinearObject", which is merely an abstraction of a
>thing that has "length");
>
I have been thinking about this type of relationship as well... I think 
we could do with having a third ontology  looking after "representable 
as" definitions (I wouldn't be thrilled to put them in the same ontology 
as the data types themselves).  In this way, client programs could be 
designed in a more generic way.

e.g. there could be a viewer for a "LinearObject" datatype, where 
"Sequence" data, and "BlastHit" data are both viewable as LinearObjects.

...this is something that has been kicking around in the back of my head 
and isn't well thought out yet since it is far from being the most 
critical issue at the moment... so if that makes no sense please ignore 
it :-)

>At the next level up from this is the somewhat infamous (around here, anyway)
>IsysObject, which simply wraps an arbitrary set of IsysAttributes to indicate
>a level of "objective coherency", and provides querying mechanisms for
>getting at attribute content of interest. I should point out that what we
>were basically trying to achieve here was a somewhat more flexible way of
>representing multiple inheritance of IsysAttribute interfaces than requiring
>developers to statically define a class that would implement a specific set
>of these IsysAttribute types, so in some sense, it's not really all that
>much weirder than the notion of multiple interface inheritance.
>
right... we allow for (but have never yet used) an object that is 
constructed in this way... although we still require that the objects 
schema be registered as a new datatype, even if it is a composite 
object!  So we are a bit tighter here...

>("Premature optimization is the root of all evil"- Knuth)
>
love it!

>The main questions I think we need to explore
>have to do with the nature of the "data type ontology" that will be serving
>as a common language. My own feeling is that it is much more
>important (and more feasible) to develop a fine-grained vocabulary along the
>lines of the IsysAttribute catalog than to try to get agreement on the correct
>structure of higher-level objects. 
>
You haven't yet convinced me of this, but I need to spend more time 
reading about your IsysAttributes :-)  Arbitrary collections of 
"thingys", even if we have a fine-grained list of valid "thingys", seem 
to be much less robust and prone to misinterpretation than a properly 
defined object...  and since our objects are minimlalist as it is, this 
does not seem less feasible than the approach you are suggesting...  In 
addition, we gain some assurance about the composition of an object in a 
backwards-compatible way by examining its parentage.

>This seems to be somewhat at odds with the
>picture that is presented in the moby_classes.txt file (available from the
>biomoby cvs tree);
>
I should have removed that from the CVS - please ignore it as it does 
not contain any "valid" data.

> I should do something similar
>for the IsysAttribute hierarchy as it currently stands, but I want to clean it
>up a bit, as I think it contains some needless complexity and certainly many
>artifacts). I also think we should give consideration to the
>"dynamic/compositional" style of growing more complex data types
>(which is more natural to XML than it is to Class definitions).
>
errrm....  I understood our object models to be exactly that - dynamic 
in their composition... so long as new complex data types are 
registered...  Am I misunderstanding you?

>At any rate, I've already violated my intention not to overwhelm
>your attention span. Please let me know if you have any questions, comments,
>etc. Hope it helps...
>
It is now 8:15!!   I managed to spend 5 1/4 hours writing this response 
(interrupted by cat and wife, both wondering why I was still on the 
computer at this time of night), and I am now well and truly knackered! 
 But it was great that you went into so much detail as it forced me to 
go out and do some additional reading in order to understand why you 
designed ISYS the way you did,  and why you espouse (or at least, keep 
an open mind about) this alternate approach to service description and 
discovery.  I'm certainly open minded about many of the ideas you bring 
up, though not entirely sold on them yet ;-)

It's time we had another MOBY DIC meeting!   Emma Lake will be frozen 
over soon, so we should probably think about meeting elsewhere.  I 
should soon have a limited travel budget as our MOBY funding will start 
to flow in a few weeks.  Lukas, are you still interested in hosting the 
meeting at Carnegie?  (no pressure!  I'm just asking because you brought 
it up as a possibility last time we met...)

In any case, we can continue to discuss these things online for the 
moment.  I'd particularly like to get the input of the myGrid people, as 
some of the issues raised here seem to fall right into their lap w.r.t. 
plans and architecture.

It's nice to be discussing MOBY in the absence of underlying 
implementation issues such as UDDI/SOAP - I think a discussion of the 
bahaviour of the system is badly needed right now before we get 
ourselves locked into something we might regret later...

I'm going to bed!

good night all,

M