[DAS] discussion document for das/2
Lincoln Stein
lstein@cshl.org
Thu, 6 Dec 2001 15:42:42 -0500
On Wednesday 05 December 2001 09:17, Matthew Pocock wrote:
> Hi Lincoln,
>
> Thanks for this discussion document. It's certainly wide-ranging and
> mentions many of the capabilities that will be required for DAS/2 to
> become a killer app. Can I kick of the 'discussion' bit of the process
> with a handfull of questions?
>
> > The DAS/2 protocol must allow annotators to make ad hoc insertions into
> > the ontology should the feature they wish to describe not match exactly
> > with any of the preexisting ones. These should be indicated at query
> > time by providing the identities of the closest parent(s) in the
> > prebuilt ontology.
>
> This search by closest parent(s) is potentialy /very/ expensive, and
> negates the use of ontology tearms. For example, we could define an
> ad-hoc parent to Telomere & Clone, but the most derived node in your
> hierachy that this can be cast to is PositionalFeature. The server would
> end up pushing all features over when you just want Telomeers and Clones.
H'mm. Let's take a use case. An annotation source wants to publish a new
feature called "Type A telomeric repeat", but the ontology only has feature
types called "Telomere" and "Repeat". I would like the client to be able to
request types of either Telomere or Repeat and get the "Type A telomeric
repeat" (among other things). What I am thinking is that when the annotation
returns the table of contents for the types it serves (the "type list"
service), it explains to the client that "Type A telomeric repeat" belongs
under the "Telomere" and "Repeat" parents. We could support the query in
either of two ways:
1) the client lists the annotation types it wants to receive. Since
the client knows where the new types fit in under the
hierarchy, it can ask for all the nodes under Telomere
explicitly
2) the client asks for the more general node, such as Telomere, and
the server, knowing its local ontology expands that to
the list of specific terms and fetches them
I don't see that either of these operations is incredibly expensive. Perhaps
there was something confusing in the way that I wrote this section?
> Do we need to distinguish between ontology tearms created by users and
> those in 'the prebuilt ontology'? The whole idea of a single central
> ontology of feature types (or anything else for that matter) makes me
> worry. As has already been said on the list, languages like DAML-OIL
> allow the equivalence classes between types to be maintained externaly,
> further erroding the importance or utility of a 'prebuilt ontology'.
Doesn't there need to be a shared skeleton ontology somewhere? After all,
you have to map each local ontology to something, and I don't know how you
handle multiway mappings among three or more local ontologies.
> > Open question: I already know that DAS/2 needs to represent ranges in
> > which one endpoint is unknown, even though this makes range queries more
> > complicated. Do we need the entire ASN.1 repertoire of fuzzy intervals?
>
> I presume you meen the embl/genbank location grammer as defined by the
> ASN.1 genbank locations definition. Under what use-cases are fuzzies
> absolutely required? Can these be represented using some alternative
> syntax or data-structures?
Yes, I mean the EMBL/GenBank location grammar. The first place I encountered
it was in the ASN.1 spec so I think of it that way. The only use case that
I've personally encountered is mapping clones to the genome, where one end of
the clone is known, and the other isn't. For this purpose, I use positive or
negative infinity for the unmapped end. What have your experiences been?
> > a. Capabilities service. The server provides the client with a list of
> > the services it provides. Since it is likely that the services will be
> > enhanced over time, the service level or version number is also
> > provided. Open question: should a client be allowed to negotiate a lower
> > level of service for backward compatility?
>
> Does this belong as a DAS service, or should it be meta-data in the
> services directory (e.g. UDDI)?
It's cleaner if the capabilities are provided as meta-data in the services
directory, as you suggest. The downside of this is that the client will have
to consult the directory service each time it connects to an annotation
server, in case the server has been upgraded recently. I'm not sure this is
how UDDI directory services are intended to be used. Comments, Brian?
> You seem to have embraced the concept of meta-data and IDs within this
> discussion document. Can the services listed (b onwards) be distilled
> down into a single service interface that takes an ontological
> expression (e.g. a DAML fragment, monograph schema, SQL) and returns all
> items that it knows about matching that fragment? The service could
> publish the DAML fragment that is guaranteed to retrieve all items it
> knows about in the services directory (e.g. UDDI), and then clients can
> work out which servers serve information they want without directly
> contacting the server.
>
> The RDBMS world uses a single SQL language to query all relations,
> regardless of wether the data is bank records, people owning cars or
> rail time-tables. Do we need seperate services for querying each type of
> biological data? If you feel that we do, then could you say what this
> gains us (as implementors of servers and of clients)? If you think the
> general query interface will be un-workable, could you say why?
Well, we risk going around in semantic circles here. You or I could put up a
relational database, make it available to the public, publish the schema, and
tell people to go ahead and query the server. In fact, we do do that, and it
is very useful for querying single data sources. But we want to have a
common data model, so that the same query templates will work with all
servers and will produce the same format results. This requires that we
enumerate and name the queries, and enumerate and name the possible query
results. If the services listed in the framework document can all be
expressed as DAML queries, then there is a direct mapping between the
services and the queries and we could indeed publish the service by
publishing the list of queries the server supports.
For my part, I'm happiest giving the services names. This makes it easier
for us to think about the requirements for the project (e.g. efficient data
structures for providing the services), and how to divide up the work.
DAML OIL is pretty cool (I just started reading about it today). I'm open to
using it to describe the relationships among annotations. I'm not clear on
how exactly the DAML world interoperates with the WSDL, XML Schema, and
SOAP worlds. Could someone explain?
Lincoln
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS.
PLEASE WRITE FOR DETAILS.
========================================================================