[MOBY-l] TECHNICAL DOCUMENT: MOBY SEMANTICS
Andrew D. Farmer
adf at ncgr.org
Wed Mar 12 20:28:01 UTC 2003
Hi all-
Here's the current revision of my writeup on semantic standards;
the only major difference between this and the version I sent out earlier
is the addition of a section covering DAML-S, the extension of DAML to develop
an upper ontology for describing services (complementary to WSDL).
MOBY PROJECT: TECHNICAL REPORT ON SEMANTIC STANDARDS
Date: 3/12/03
Author: Andrew Farmer
Version: 1.1
This is intended to give a high-level overview of the work that others are
doing in the area of "semantic" representation standards for the web.
To give a brief overview of the set of topics I'll be addressing here,
I will merely quote a concise but excellent summary:
"the 'stack' of Semantic Web related W3C recommendations:
-XML provides a surface syntax for structured documents, but imposes no
semantic constraints on the meaning of these documents.
-XML Schema is a language for restricting the structure of XML documents.
-RDF is a datamodel for objects ("resources") and relations between them,
provides a simple semantics for this datamodel, and these datamodels
can be represented in an XML syntax.
-RDF Schema is a vocabulary for describing properties and classes of RDF
resources, with a semantics for generalization-hierarchies of such
properties and classes.
-OWL adds more vocabulary for describing properties and classes: among others,
relations between classes (e.g. disjointness), cardinality (e.g. "exactly
one"), equality, richer typing of properties, characteristics of
properties (e.g. symmetry), and enumerated classes. "
(from "Web Ontology Language (OWL): Overview"
http://www.w3.org/TR/owl-features/)
I'll elaborate a little bit on these subjects and give some thoughts on
the significance of each of these layers to MOBY in what follows. It is
perhaps worth pointing out, however that the characterization of these
topics as a stack is perhaps a little misleading, especially with respect to
RDF, which does not depend on XML, but uses it as one possible serialization
format. However the notion of a stack is definitely useful in understanding
how the more advanced semantic standard represented by OWL builds on top
of the foundations laid out by RDF and RDF Schema. (OWL represents the
transitioning of the earlier DAML/OIL work to an RDF foundation under
the auspices of the W3C; OWL itself is broken into a stack of OWL Lite,
OWL DL and OWL Full.) It's a rather elegant layering that seems to facilitate
a very flexible approach to adopting the level of semantic precision needed
in a variety of contexts (which can become very abstruse), as long as one
accepts the basic foundation of the RDF model.
---------------------
XML Schema:
At its core, XML Schema provides a way for specifying constraints on XML-encoded
data in terms of simple element "datatypes" and complex element structures.
In other words, an XML document representing a set of XML Schema
constraints allows a schema processor to determine whether or not another XML
document is a valid instance of that schema or not; this is essentially
the same sort of thing for which people use DTDs, with some enhanced
features of the constraint language.
As far as "datatypes" are concerned, this is something that was almost
totally lacking in the facilities provided by DTDs (with the possible exception
of such "datatypes" as NMTOKEN). The specification provides a number of
"built-in" datatypes (various flavors of numeric, datetime, and string),
the sort of things one would map to the basic types in a database system
or programming language. Each datatype is characterized in terms of
a "value space" (e.g. signed 32 bit integers) and a "lexical space" (how the
values are represented as characters). A standardized representation of
a "null" value is also given ("xsi:nil").
Some of these types are "primitive",
while others are derived in terms of restrictions ("facets") placed on these
primitive types; for example, "integer" is derived from "decimal" by
constraining the fractionDigits facet to be 0. Facets may also be used by
schema authors to derive new simple types; for example, a "pattern" facet
allows one to constrain types by means of a regular expression, or an
"enumeration" facet allows one to enumerate the allowed values for a type.
In addition to the restrictions that may be applied to derive more
restrictive simple types, new simple types may be formed using language
constructs for defining lists and unions of simple types.
The other main advance that XML Schema makes over DTDs is in its support
for XML namespaces, which allows a much greater degree of modularity in
building up complex schema specifications from simpler schema elements that
may have been designed independently (and hence may have namespace collisions
when brought together). This seems to constitute the main advance of XML
Schema over Document Type Definitions with respect to characterizing complex
element structures, although there are a number of new features of the Schema
language that are aimed at increasing the modularity of defining the elements
of complex structures.
As with other XML schema specification languages, the main point of using XML
Schema is to allow a designer to specify constraints on documents that will
allow any given instance of a document to be validated against the constraints
specified in a schema. As such, they are primarily concerned with element
content (datatypes) and element markup structure, without technically
supplying any formal "semantics". This is an extremely nice distinction,
but the idea seems to be that one can adopt a certain set of conventions for
encoding logical concepts such as class/property constructs or
class/subclass relationships in your XML structures, but the XML Schema
specification itself does not supply a "semantic interpretation" (i.e one
that dictates a set of logical inference rules) for its constructs. Thus,
someone coming across an XML Schema specification without knowing the
particular conventions, wouldn't be able to "reverse engineer" a
logical interpretation simply from the structure. Of course, a well-designed
XML structure with human readable tags would probably allow an intelligent
user (who was conversant with the domain being described) to infer the
semantics correctly. Nevertheless, it is this perceived "semantic opaqueness"
of the relationships between the pieces of XML structures that forms the
core of the argument for using RDF as the foundation of the "semantic web"
and its various ontology languages, rather than arbitrary XML grammars.
Here are several reasonably good discussions on the subject of why
XML structures are viewed as inadequate as a foundation for representing
semantics for the web (some also address possible translations of ontological
relationships into schema descriptions):
"Why RDF model is different from the XML model"
http://www.w3.org/DesignIssues/RDF-XML.html
"The Semantic Web - on the respective roles of XML and RDF"
http://www.ontoknowledge.org/oil/downl/IEEE00.pdf
"A Comparison of (Semantic) Markup Languages"
http://trellis.semanticweb.org/expect/web/semanticweb/paper.pdf
"The Relation between Ontologies and Schema-languages"
http://www.cs.vu.nl/~mcaklein/papers/oil-xmls.pdf
The section entitled "The Problem" of "XML with Relational Semantics: Bridging
the Gap to RDF and the Semantic Web" (http://www.w3.org/2001/05/xmlrs/)
is a short but fairly intelligible exposition of this as well...
Despite this, XML Schema does figure into most of the proposed semantic
standards; however, its use there seems to be limited to use of its
datatyping facilities, rather than its mechanisms for specifying complex
element structures.
Another interesting point that we should perhaps consider with respect to
XML Schema and its use in MOBY is made in "Comparing XML Schema Languages"
(http://www.xml.com/lpt/a/2001/12/12/schemacompare.html)
"One of the key strengths of XML, sometimes called "late binding," is the
decoupling of the writer and the reader of an XML document:
this gives the reader the ability to have its own interpretation and
understanding of the document. By being more prescriptive about the
way to interpret a document, XML schema languages reduce the possibility of
erroneous interpretation but also create the possibility of
unexpectedly adding "value" to the document by creating interpretations not
apparent from an examination of the document itself."
So, for example, we should consider whether it would be better for
MOBY to have a "strongly-typed" notion of chromosome position in a
central data ontology that forced it to be numerical data (and whose
responsibility it would be to do this validation), or for MOBY to simply
mark the concept of chromosome position and for consumers of chromosome
position data that did not meet their expectations to ignore it or throw
errors. The answer is probably "both": some software
would probably benefit from being able to recognize that cytogenetic
positional information and base pair and centimorgan coordinates are all
conceptually related as being "kinds of" genomic positions; other
software will want to ensure that it's not trying to average "4q22",
"100.32 cM" and "2356 bp". I have the sense that we will need to be careful
to make sure that we do not tangle up orthogonal concerns in the
design of the system, i.e. that we do not impose "type-safety" unless it is
needed. This separation of "description" from "constraint" seems to be a
recurring motif in a lot of the work that is being done in this area;
the basic separation between the notions of well-formedness and validity
in XML documents is the most familiar example, but we will see the same
idea expressed in somewhat different terms as we explore the higher levels
of the ontology description stack.
---------------------
RDF (Resource Description Framework)
At a basic level, RDF seems so simple that it can be initially rather hard
to understand why it should be taken so seriously by semantic web researchers
as the foundation of the next generation web. There are a lot of subtleties
to RDF that I wouldn't claim to understand, but I think I'm beginning to
grasp the core of the idea, and think that it's well worth considering
its significance to MOBY.
The basics of the idea will be familiar to anyone coming from a data-oriented
background; in fact, RDF is described in various places as the key to
building a "data-oriented" web, as opposed to the "document-oriented" first
generation web.
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
"The foundation of RDF is a model for representing named properties and
property values. The RDF model draws on well-established principles from
various data representation communities. RDF properties may be thought of as
attributes of resources and in this sense correspond to traditional
attribute-value pairs. RDF properties also represent relationships between
resources and an RDF model can therefore resemble an entity-relationship
diagram. (More precisely, RDF Schemas - which are themselves instances of
RDF data models - are ER diagrams.) In object-oriented design terminology,
resources correspond to objects and properties correspond to instance
variables."
Before we get into a discussion of the key differences of RDF from the
relational and object-oriented models, let's make sure we get a handle on
the standard terminology. The unit of meaning in RDF is the statement, which
is conceptualized as a triple consisting of a subject, an object and a
predicate (or property) that relates the two. The subject of an RDF statement
is always a resource, that is, something with a URI. The object may be
either another resource or a "literal" of some sort (e.g. a string, an integer,
a chunk of XML). The predicate that relates the two is called a property;
properties are themselves a special subset of resources (this is important,
and we'll expand on it in what follows). So, in relational database terms,
an RDF statement can be thought of as giving the value (the object) of a
particular column (the property) for a given row (the subject). If the object
is another resource, the column would be a foreign key, otherwise for a literal
the column would be a regular datatype; similarly, in object-oriented data
modeling, statements with resources for their objects would be instance
variables storing references to other objects, while those with literals would
correspond to primitive datatypes.
There are two subtle, but critical differences between RDF and these familiar
approaches to data modeling, which make it suitable to be the data modeling
technique for the web in its sense of a universal information space.
The first is the use made in RDF of the web's URI.
RDF basically provides a framework for making "meaningful" assertions or
"statements" about "resources"- that is to say, things that have been given
URIs. Having a URI is the sine qua non of being "on the web"; it means having
a unique identity in the universal information space of the web (which is
independent of whether or not the resource is network-retrievable, like a
web page). Having a URI is like having a primary key in a database or a
reference to an object, but instead of the scope being this database or this
computer's memory, the URI is universal. (Note that the LSID of the I3C is a
species of URI.) So, by insisting that its identifiers are universal,
RDF provides for decentralization of data without fear of "id-space collisions".
The second key difference is that the properties themselves have URIs.
This is important for several reasons. First, it allows properties to
be "first-class citizens" of the data model, independent of constructs
such as tables or classes; the same property (as identified by URI)
need not be constrained to apply only to instances of a given class of
object (although higher levels in the "semantic stack" allow the domain and
range of properties to be constrained); alternatively, one can imagine any
property as defining a class of objects (those objects which have been
given a value for the property). Second, the property's significance is
universal, i.e. anyone who uses the same property to make an assertion
must (or should) "mean" the same thing as anyone else. Third, since properties
are themselves resources, they may be used as the subject of statements. This
forms the basis of RDF's ability to define arbitrary levels of metadata
using the same basic model, and leads to the vocabularies of properties defined
in the higher levels of the "semantic stack" for defining schemas and ontologies
in the RDF model.
Looking back at the analogy to relational databases, some of these differences
are well expressed by the following description:
"Is the RDF model an entity-relationship model? Yes and no. It is great as a
basis for ER-modelling, but because RDF is used for other things as well,
RDF is more general. RDF is a model of entities (nodes) and relationships. If
you are used to the "ER" modelling system for data, then the RDF model is
basically an opening of the ER model to work on the Web. In typical ER model
involved entity types, and for each entity type there are a set of
relationships (slots in the typical ER diagram). The RDF model is the same,
except that relationships are first class objects: they are identified by a
URI, and so anyone can make one. Furthermore, the set of slots of an object
is not defined when the class of an object is defined. The Web works though
anyone being (technically) allowed to say anything about anything. This means
that a relationship between two objects may be stored apart from any other
information about the two objects. This is different from object-oriented
systems often used to implement ER models, which generally assume that
information about an object is stored in an object: the definition of the class of an object defines the storage implied for its properties.
For example, one person may define a vehicle as having a number of wheels and
a weight and a length, but not foresee a color. This will not stop another
person making the assertion that a given car is red, using the color vocabulary
from elsewhere. Apart from this simple but significant change, many concepts
involved in the ER modelling take across directly onto the Semantic Web model."
(from "What the Semantic Web can represent"
http://www.w3.org/DesignIssues/RDFnot.html)
I'd like to call special attention to the idea in this paragraph that
"The Web works though anyone being (technically) allowed to say anything
about anything." This is stated as one of the principal design goals for
RDF (see http://www.w3.org/TR/2002/WD-rdf-concepts-20021108/#section-Overview),
and seems to be really fundamental to understanding the view of the semantic
web research world (whether or not you accept it as a desirable goal).
A set of RDF statements is easily represented as a directed, labeled graph in
which the nodes represent subjects and objects of statements and the edges
represent the predicates. (Technically, it's a multigraph, since two nodes
representing resources may be connected by many edges.) Each resource used
as a subject or object in a statement labels a single node in the graph,
and each edge is labeled by the property. Thus, unlike XML whose ancestry as
a document-oriented language led to its "natural representation" as a tree
(reflected in the DOM API, for example), RDF is really built upon a general
graph-oriented (read: many-to-many relationships) foundation.
As I have pointed out elsewhere, the distinction between the XML approach
and RDF is confused by the fact that the RDF spec suggests a standard
serialization model for RDF into an XML syntax. The canonical XML format for
RDF has been the source of some controversy and, some say, is one of the major
reasons for RDF's failure to be as widely adopted as XML by the regular
web development community. However, there are other simpler
serialization formats for RDF, the most prominent being the N3 format which
basically represents each statement directly as a triple. At any rate, it is
important to bear in mind that RDF is a theoretical model and and is not
dependent on any particular syntactical implementation, in the same way that
the relational model is not dependent on a physical implementation.
The theoretical model behind RDF is not merely a proposal for an open-ended
scheme for information encoding, but is concerned with the problem of inference;
i.e., given a set of explicit RDF assertions, what implicit facts may be
inferred from them. Thus RDF should be explicitly understood as providing
a substrate for AI-like inference engines. (For example, see
http://www.xml.com/pub/a/2001/04/25/prologrdf/ for a discussion of the
possible translation of RDF into Prolog knowledge bases.) One of the major
features of the logical model behind RDF is the notion of "monotonicity",
which basically means that no "conclusion" is drawn from a set of RDF
statements which additional "premises" could invalidate; there are a lot
of subtleties to this, but the basic import seems to be that the inference
engines are restricted from making any "closed world assumptions" about the set
of assertions from which they derive conclusions.
RDF has some more complicated constructs which I'll only mention in passing,
such as:
-blank nodes for representing complex assertions as a set of simple
statements
-support for collections (ordered and unordered)
-"reification" which allows statements to be referenced as
components of other statements (e.g. representation of the evidence
for or belief in an assertion could be captured in this way).
There seems to be good tool support for basic RDF manipulation in a variety of
languages, although it's certainly not as mainstream yet as XML.
See, for example, the lists at:
http://www.ilrt.bris.ac.uk/discovery/rdf/resources/
Finally, some further reading (none of it is too heavy) that may be helpful to
get a perspective on the significance of RDF:
This is short and non-technical, but worth reading to get a feel for the
intention behind RDF (especially the last two sections):
"Business Model for the Semantic Web"
http://www.w3.org/DesignIssues/Business
I found these to be helpful introductions to the vision of the semantic web
and the importance of RDF in that vision:
"The Semantic Web (for Web Developers)"
http://logicerror.com/semanticWeb-webdev
"The Semantic Web In Breadth"
http://logicerror.com/semanticWeb-long
This little discussion of the "utility" of the RDF approach for simple data
representation resonated with me, especially the bit about the situations in
which it is advantageous to not constrain oneself to a definitive schema ...
from "RDF, What's It Good For?"
(http://www.xml.com/pub/a/2002/11/13/deviant.html)
"Responding to both St.Laurent's claim about straitjackets and to Champion's
plea for a demonstration of RDF's utility, Eric van der Vlist said that lots
of things -- like RDBMS and XML -- are straitjackets, that every storage or
representation technology has advantages and disadvantages, including RDF.
"RDF and its triples," van der Vlist claimed, are "really lightweight when you
have the right tools to manipulate them. I like to think of them as a RDBMS
with a variable geometry: each 'row'...can have a variable number of columns..."
Van der Vlist makes nicely the point I made earlier about Python's rdflib.
Being able to use RDF as a loose storage system, without having to worry
about outgrowing (or even fully specifying, in advance) an RDBMS schema can be
very helpful, in at least two situations: because, first, you don't know what
the data schema really is yet, owing either to problem domain constraints
or to an extended prototype stage; and, second, because in some applications
the storage schema needs to stay very flexible and extensible for the
lifetime of the project. Or, as van der Vlist said, RDF is "like a RDBMS which
you could populate before having written any schema, that's really very
flexible..."
---------------------
RDF Schema
Once you have grasped the concepts behind RDF, RDF Schema is best understood as
introducing a basic set of resources (i.e. terms identified by URIs using a
common namespace for RDFS) which can be used by people who want to build
their own domain-specific schemas by referring to a standard vocabulary of
terms. For example, one can make an RDF statement such as "moby:Gene rdf:type
rdfs:Class", meaning that the conceptual resource identified as "moby:Gene" is
an element of (the meaning of rdf:type) the set of resources described by the
concept represented by rdfs:Class; this, in turn means that it can be used as
the "object" in statements that use rdf:type to describe a resource. In other
words, whereas RDF basically describes the way in which meaningful statements
can be made about things, RDF Schema begins to define a vocabulary for making
statements (in RDF) that describe those things known as "schemas", and
establishing the semantics of the terms in these vocabularies (e.g.
rdf:type is an instance of rdf:Property whose rdfs:range is rdfs:Class).
To get a better sense for the sorts of semantics that may be expressed using
the terms in the RDF Schema vocabulary, you may wish to take a quick glance
at http://www.w3.org/TR/rdf-schema/#ch_summary. The most important are the
rdfs:subClassOf and rdfs:subPropertyOf which allow for the development of
inheritance hierarchies among the terms used in a schema described using the
rdfs vocabulary.
Now, this may seem like a somewhat cumbersome system for essentially
expressing the same basic information that object-oriented languages or
relational modeling tools allow one to express. However, it is important to
realize that the intention is to translate these sorts of concepts outside of
the context of any particular programming language and into the universal
information space of the web. To use a familiar example, GO is
distributed as RDF that uses a property defined in the GO namespace
"go:isA" to relate the resources it is describing; assuming that it's more
likely that a semantic search engine would be written to the language of
the RDF Schema than to GO, it would perhaps be better for GO to use the
same terms (i.e. rdfs:subClassOf); on the other hand, one of the nice
properties of RDF is that the discrepancy is easily amended (at least
conceptually) by adding an assertion that the two properties are logically
equivalent.
Furthermore, the grounding in RDF has some
interesting implications for the behavior of "schemas" defined in this way.
I highly recommend reading the excellent discussion in
http://www.w3.org/TR/rdf-primer/#interpretingschema , but I'll paraphrase
it briefly.
First, the fact that the focus of RDF is on relating things
via properties (rather than defining classes of things in terms of properties)
has the natural corollary that properties may be described schematically
without reference to their context in describing things belonging to
classes defined by means of these properties. So, for example, one could
define a property "Name" and it would have the same significance with respect
to any object that had a string predicated of it via this property; contrast
this with the case where many different classes have a property with this
name, but it is not prima facie evident whether a gene.name property is in any
sense similar to a person.name property. It could be argued that the ambiguity
in the latter case could be addressed by an inheritance scheme that placed
the name property in a superclass from which both gene and person were
derived; however, this strategy will require multiple inheritance when
considering multiple properties that may be independently combined, and in
the limit essentially reduces to defining each property independently as a
class of things that have that property.
Second, type systems as used in closed world environments like programming
languages or databases are fairly tightly bound to the application of the
constraints that are implied by the type descriptions; an RDF Schema
description however, merely describes how a processor might test conformance
of certain instances of data to the properties described by an RDF Schema, but
it obviously can't do anything to enforce the constraints. So for example, if
I create a certain Class definition in Java, it will constrain the set of
properties that may be associated with an instance of that Class, and will
not allow me to dynamically associate new properties with an object instance
that would potentially alter its typing at run-time. This makes good
sense in non-distributed, or centrally coordinated environments, and is
certainly key to implementing applications like compilers and database systems
that need to organize data efficiently according to the design-time constraints
on the runtime behavior. On the web, on the other hand, any schematic
description is essentially only another set of assertions that someone has
made about something, and may not even be known to someone else making another
set of assertions with respect to the same thing.
----------------------
OWL: Web Ontology Language
OWL can basically be understood as a further extension of the work begun in
RDF Schema to define a set of resources for use in RDF-based descriptions of
schemas/ontologies, with more expressive/complicated semantics. Its history
begins in the independent development of DAML and OIL by US and European
researchers, their fusion into DAML+OIL, and the final stage to reformulate
the terms and semantics developed in those efforts to be consistent as
an extension of the RDF and RDF Schema framework.
For example, one can describe cardinality constraints on properties with
respect to their use with a given class; one can characterize domain-specific
properties in terms of classes of logical properties such as transitivity,
or relate two properties as being inverses of each other (hasParent/hasChild);
one can make assertions about the logical equivalence of two classes or
properties or assert the identity of two individual instances; one can
characterize classes as class expressions that are logically composed of
other classes (via unionOf, intersectionOf, complementOf), or define a class
as an enumeration of a set of individuals.
It is important to note that the logical expressions
that are introduced in the language have their roots in a tradition of
research on the subject of "description logics", for which efficient
algorithms have been developed to perform inferences on sets of assertions
about classes and properties that allow a reasoning engine to answer
questions such as those regarding the logical subsumption of concepts
(i.e. hierarchical relationships between concepts that have not been explicitly
encoded, but which follow from their definition in terms of properties),
or inconsistencies between the constraints asserted for a given concept and
an instance asserted to belong to that class of things, but violating the
declared contraints on membership of that class.
In connection with this notion of inference power, it is perhaps
worth noting that the various flavors of OWL (Lite, DL and Full) are aimed at
different levels of inferential power and computational difficulty. OWL
Lite is intended to support classification hierarchies and simple constraint
features (e.g. cardinality values of 0 or 1). OWL DL includes all OWL language
constructs, but restricts their uses in ways that will guarantee that all
reasoning based on these constructs will be decidable and complete; this seems
mostly to relate to the notion of "type separation", which requires that
classes, properties and instances be treated as disjoint sets (e.g. a
class cannot be considered a "instance" of some other class, only a "subclass").
OWL Full removes these restrictions, but with the result that "it is unlikely
that any reasoning software will be able to support ever feature of OWL Full."
It may be instructive to look at the use cases and requirements that were
developed by the group responsible for the OWL proposal
(http://www.w3.org/TR/webont-req/) and to consider whether we could imagine
similar usages in the context of a system like MOBY.
A decent (and fast) practical overview of the application space of ontologies
at the level of DAML+OIL (i.e. OWL DL) is given in
"DAML+OIL for Application Developers"
(http://www.daml.org/2002/03/tutorial/all.htm); this covers the most important
constructs found from RDF up to DAML+OIL and provides a good set of pointers
(a little bit dated, but fairly representative) to applications that
have been developed on top of the semantic web framework (from RDF to
DAML+OIL).
A good introduction to the history behind this level of the stack and the
sorts of problems that are being addressed at this level is given in
"OIL in a Nutshell" (http://www.cs.vu.nl/~ontoknow/oil/downl/oilnutshell.pdf)
-----------
DAML-S: An extension of DAML+OIL for characterization of "services"
DAML-S is an extension of the DAML+OIL ontology that provides a specialized
set of ontologically defined terms for use in describing service capabilities.
It is intended to be complementary to the capabilities provided by standards
such as WSDL; whereas the latter provides a specific syntax for describing
how to interact with a service (message formats, protocol bindings, etc.) it
does not provide any formal semantics for describing what the service does.
I believe that the myGrid work has built on top of the DAML-S foundation
as an domain-neutral ontology for services that they have augmented with
bioinformatic-specific service concepts (BLAST et al.)
The objectives of DAML-S are to develop a language for service description
that specifically provides for:
-semantically rich discovery of services based on specified constraints
-automated invocation of services, including facilities for
interoperability, e.g. message parameter translations
-composition of new services from existing services
-monitoring execution of services
It is worthwhile to look at some of the motivating examples of these
features in section 2 of [1]. These include reasonably complicated examples
of services, with a particular emphasis on the notion of dynamic
facilitation of interoperation via "computer-interpretable API", as opposed
to possible senses of the latter phrase that might be restricted to
compiler-like type-checking.
The DAML-S "upper" ontology is broken into three parts: ServiceProfiles,
ServiceModels and ServiceGroundings.
ServiceProfiles are intended to describe services at a level that will
support discovery. It is concerned with high-level characteristics
of services such as inputs, outputs, preconditions and effects,
information about the service provider and functional characteristics such
as quality of service or geographic radius. It should be noted that
description of these characteristics is semantic as opposed to the syntactic
characterization of XML Schema types given in WSDL documents to describe the
contents of messages. The emphasis here is on a declarative representation of
service capabilities that is not bound to any one form of registry or style of
lookup. For example, the case where demand for a service outweighs supply
is discussed in terms of a registry of requests that would presumably be
characterized using similar semantics and queried by the providers of
services. The characterization of services at this level is intentionally
somewhat less precise than what would be necessary in order for a consumer
of the service to interact with it; the focus of the profile is to enable
discovery.
The next aspect to service description in the DAML-S ontology is the
ServiceModel. This is primarily concerned with describing the process
model (control flow/data flow) involved in using the service and is aimed
at enabling composition and execution of services. There is a degree of
overlap between what may be specified in the Profile vs. what is specified
in the Model, for example both support description of inputs,
outputs, preconditions and effects; however, there is no constraint that the
information specified in these two places be the same. The basic idea is that
while the description offered by the Profile is aimed at rough matching of
the needs of a consumer with the capabilities of the provider, the Model
is used to support an actual interaction with the discovered service, and
thus may wish to specify this information more precisely or provide fuller
details. For example, services may wish to expose some details about their
internal process model, such as whether or not they are "atomic" or
"composite"; in the case of a composite process, it may describe how it
is composed of other services and how the information/control flow takes
place between the components. This latter area (information/control flow)
is also the subject of several other web services oriented standards
having to do with choreography or orchestration of conversational state
such as WSFL (Web Services Flow Language) and BPEL4WS (Business Process
Execution Language for Web Services !!blah!!). In some ways, this seems like
a strange capitulation of the basic notion of encapsulation, but as I
understand it, the idea is to support such use cases as:
-representation of "workflows" that support high-level goals, but whose
components (service instances) may be composed given a particular
client's preferences or runtime circumstances
-process monitoring
(The REST community has some interesting ideas about how the REST approach
can be used for the purposes of coordinating distributed processes based on
REST's resource-centric approach: see, for example:
http://www.prescod.net/rest/state_transition.html)
Finally, DAML-S provides the notion of a Service Grounding which describes
how to take the abstract specification of the Model and translates it into
a concrete messages to be passed between the service consumer and service
provider. It is very similar to the concept of a binding presented by WSDL,
and the authors of the DAML-S ontology show how a DAML-S grounding can be
specified in terms of a mapping onto WSDL. I'm not clear on all the subtle
details here, but the most significant point seems to be that the WSDL
specification of types for its messages is done in terms of an XML Schema
specification, while the DAML-S Grounding specifies message parts in terms
of DAML-OIL classes. Thus, the latter is "semantically accessible" to
inference engines, whereas the former is "syntactically concrete" enough to
be used by toolkits that can automatically generate the messages.
As far as significance to MOBY is concerned, I think that DAML-S is at least
worth a certain amount of consideration in terms of its separation of the
notion of description for the purpose of service discovery and description
for the purposes of invocation or other interactions with a service. It seems
reasonably clear that the lack of semantics in WSDL is problematic with respect
to service discovery, and the UDDI solution to this (as far as I understand it)
seems to rely on predetermined taxonomies of service types which are far
from being well-defined in our domain. On the other hand, it's not clear to me
how useful many of the specific classes/properties for service description that
are given by DAML-S might be for MOBY. I believe some of the myGrid folks have
expressed a certain amount of dissatisfaction for the level of complexity
introduced by some of these upper-level ontologies.
[1] "DAML-S: Semantic Markup for Web Services", available at
http://www.daml.org/services/daml-s/0.7/daml-s.html
-----------
Some final thoughts on the importance of "semantic web" to MOBY
As far as the significance to MOBY of RDF and the associated "semantic stack"
is concerned, it seems to me that there are several major issues to be
considered. (These are pretty rough at this point, but may help to ground
all of the abstract discussion into our "problem space" a little bit...)
The first is simply the question of to what extent MOBY might benefit from the
sorts of applications that are already being developed around this framework,
from simple APIs for manipulating sets of RDF statements to inference engines
and semantic search tools built on top of the RDF foundation and higher
levels in the stack. It's pretty clear that the myGrid project has gone
far along this path (and are up at the OWL level of the stack as far as semantic
expressiveness is concerned); see, for example the recent announcement of
myGrid's choice of the Cerebra "inference engine" to drive the project:
http://lists.w3.org/Archives/Public/www-rdf-logic/2003Jan/0001.html. Whether
or not we feel we need to embrace this level of semantic complexity for MOBY,
it seems clear that we are going to at least be
Next, to what extent do we see the goals of the "semantic web" in terms of
its extreme embracing of the "open world" prinicple as being consistent with
the MOBY vision; or conversely, are we perfectly happy to accept that in
certain respects we can assume a certain level of internal convention? For
example, I could easily imagine wanting to construct an XML element structure
representing position on a genome that was "semantically opaque" in the sense
that it had subelements (e.g start and end) that were not intended to be
understood or referenced independently of their context in the genomic position
structure. On the other hand, it seems to me that there are fundamental
semantic constructs we will need that are supplied by RDF, such as the
basic construct of asserting "semantically-typed" data about a uniquely
identified thing and at least the very basic sorts of concept hierarchy and
perhaps concept/property relationship semantics; it certainly seems worth
considering as a foundation for data/metadata representation for our system.
Finally, I should note that in some ways, the core of the vision for the
semantic web seems to have some interesting parallels (at least superficially)
to some of the problems that were explored independently in ISYS and DAS.
For example, I see the work we did in terms of loose data modeling with
the IsysAttribute and IsysObject constructs as being quite similar in respects
to the property-centric view of RDF and the support for dynamic aggregation of
data with respect to an object changing its interpretation in the system. On
the DAS front, there seems to be a loose parallel between the notion of
the reference server establishing a common coordinate system for "any annotation
server to provide annotations for any reference sequence" and the notion of
URI space providing the common identity space for "anyone to make any RDF
assertion about anything (represented in URI space)".
More information about the moby-l
mailing list