[DAS2] properties and queries

Andrew Dalke dalke at dalkescientific.com
Tue Feb 7 15:45:00 UTC 2006


To summarize, the current thought here for properties and queries
is as follows  (it's a long summary.  More like an essay.  :)

Add support for zero or more <NOTE> elements in the feature, of
the form
   <NOTE>This is some arbitrary (but non-markup-ed) text</NOTE>


Add a features search keyword "note=" which takes a search string
to be found in the note elements.  (substring? soundex? regex?
the search engine calls up Lincoln and asks?)


Add support for zero or more <ALIAS> elements in the feature,
of the form
   <ALIAS name="Zorro">

(I missed this in the redraft.  It should have been there.
Feature filter "name" already says it searches the "name" and
"alias" fields for a feature.)



Ignore the "phase" property (contentious, perhaps?) or add it
as an attribute of something else in the feature element.



Ignore the "score" property.  As written in the current spec
   "score" A floating point number indicating a context-dependent
   score. This is to be used only when a more specific ontology-driven
   score cannot be used.  (Umm, where do the other scores go?)
Unless someone wants to define that score ontology and what it means
to search that field, this is a can of worms I don't want to open.



Ignore the "editable" property.  As written (and kibbitzed)
   "editable" indicates that features may be updateable (this is at the
   discretion of the server).  (But this is potentially per-user data.)

This should either be in the feature type or it should be in
some write-back specific data structure the client can fetch.
(To be discussed) It isn't a feature property.

This gets rid of all stated needs for arbitrary key/value data.


That doesn't mean there won't be future needs.

In that case, here's how to add new pieces of data.

1) use a non-DAS extension element.  Clients must ignore elements
they don't understand.

This is good for storing data, but not for searching.  The
thing is, the search mechanism (or multiple search mechanisms
perhaps) is data field specific.  Hence,

2) servers may provide extensions to the basic DAS query mechanism.
Currently the mechanism is:
   and-ed set of zero or more  keyword = (set, of, or, terms, for, 
keyword)
where "keyword" is well-defined by DAS except for the "att"
property keywords.

Query extensions add new keywords in the same syntax, and define
somewhere how that syntax works.  It must be backwards compatible
to the existing syntax and semantics.

The problem then is clients don't know that a server supports a
given query extension, so

3) add a <SUPPORTS> element to the <CAPABILITY> element.
(Also proposed, renaming "CATEGORY" back to "CAPABILITY".)
The CAPABILITY may have zero or more of

   <SUPPORTS name="some-unique-string" />

Here are the two defined unique strings,

   <SUPPORTS name="all" />
   <SUPPORTS name="das2" />

The "all" query says that a client may reasonably fetch all
the features in one go.  This would occur with a small DAS
server containing only a few hundred features.  In that case
there's no need to even have a CGI script running on the
back end - just a set of flat files.  The query is done by
fetching the URL with no parameters.

A rich server with millions of features might decide to
not support an "all" query.

The "das2" query is the one we've been talking about.

If a site develops a query extension it adds

   <SUPPORTS name="sanger-curation-search" />

so clients know what the server can do.  (In this case supporting
searches for "annotator", "annotation_before" and "annotation_after"
fields.)

That all said, this doesn't mean that the server shouldn't
have a property table.  It's a question of what it means
to search the property table.

People here want the following:
   multiple properties may have the same key and different value
   the order of the properties is not important
   the "att:" search is renamed a "prop:" search, like "prop:author"
   the search is a substring search.
   a feature matches a search if any of the properties with that name
      match the substring search

For example,
   source = BLAST 2.3.4
   author = Andrew Dalke
   author = Thomas Down

lets me search for

   features?prop:author=Andrew
all features with "Andrew" as a substring in the "author" property

   features?prop:author=Andrew;source=BLAST
all features with "Andrew" as a substring in the "author"
and with "BLAST" in the source name

   features?prop:author=Andrew,Thomas
all features with "Andrew" or "Thomas" as an author



Really what I think this essay is doing is saying that
storing data and searching data is different.  Servers can
develop new ways to extend DAS searches and flag that they
support new searches.  (Eg, the new search may be to support
a different way to search a field in the property table.)

But there needs to be a really basic substring search, given
that there will be simple string key/ string value data
for the property table.

Oh, and should the key/value table also include my proposed
"href" and embedded binary data fields like images?  Hmmmmm....

Lots of talk about this here.  Time for a tea break.

					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list