[DAS] discussion document for das/2

Matthew Pocock mrp@sanger.ac.uk
Wed, 05 Dec 2001 14:17:51 +0000


Hi Lincoln,

Thanks for this discussion document. It's certainly wide-ranging and 
mentions many of the capabilities that will be required for DAS/2 to 
become a killer app. Can I kick of the 'discussion' bit of the process 
with a handfull of questions?

> The DAS/2 protocol must allow annotators to make ad hoc insertions into 
> the ontology should the feature they wish to describe not match exactly 
> with any of the preexisting ones. These should be indicated at query 
> time by providing the identities of the closest parent(s) in the 
> prebuilt ontology.


This search by closest parent(s) is potentialy /very/ expensive, and 
negates the use of ontology tearms. For example, we could define an 
ad-hoc parent to Telomere & Clone, but the most derived node in your 
hierachy that this can be cast to is PositionalFeature. The server would 
end up pushing all features over when you just want Telomeers and Clones.

Do we need to distinguish between ontology tearms created by users and 
those in 'the prebuilt ontology'? The whole idea of a single central 
ontology of feature types (or anything else for that matter) makes me 
worry. As has already been said on the list, languages like DAML-OIL 
allow the equivalence classes between types to be maintained externaly, 
further erroding the importance or utility of a 'prebuilt ontology'.


> Open question: I already know that DAS/2 needs to represent ranges in 
> which one endpoint is unknown, even though this makes range queries more 
> complicated. Do we need the entire ASN.1 repertoire of fuzzy intervals?


I presume you meen the embl/genbank location grammer as defined by the 
ASN.1 genbank locations definition. Under what use-cases are fuzzies 
absolutely required? Can these be represented using some alternative 
syntax or data-structures?


> a. Capabilities service. The server provides the client with a list of 
> the services it provides. Since it is likely that the services will be 
> enhanced over time, the service level or version number is also 
> provided. Open question: should a client be allowed to negotiate a lower 
> level of service for backward compatility?


Does this belong as a DAS service, or should it be meta-data in the 
services directory (e.g. UDDI)?


You seem to have embraced the concept of meta-data and IDs within this 
discussion document. Can the services listed (b onwards) be distilled 
down into a single service interface that takes an ontological 
expression (e.g. a DAML fragment, monograph schema, SQL) and returns all 
items that it knows about matching that fragment? The service could 
publish the DAML fragment that is guaranteed to retrieve all items it 
knows about in the services directory (e.g. UDDI), and then clients can 
work out which servers serve information they want without directly 
contacting the server.

The RDBMS world uses a single SQL language to query all relations, 
regardless of wether the data is bank records, people owning cars or 
rail time-tables. Do we need seperate services for querying each type of 
biological data? If you feel that we do, then could you say what this 
gains us (as implementors of servers and of clients)? If you think the 
general query interface will be un-workable, could you say why?

Here's to discussion.

Matthew

ps thanks again for putting this all together