[DAS2] query language description

Andrew Dalke dalke at dalkescientific.com
Fri Mar 17 02:05:06 UTC 2006


Chris:
> ignorant question.. (I have only been tangentially aware of the outer 
> edges of the whole das2 process)..
>
> how are you determining the functionality required? surely someone 
> somewhere will want to write a das2 client that implements boolean 
> queries

It was informal, based on feedback from client developers and 
maintainers.
Lincoln, Thomas, Andreas, Gregg and others provided that feedback.
It was not by talking with users.

I know there's a wide range of users and use cases.  The point
of this query language is to have basic functionality that all
servers can implement.

> right now they are forced bypass the constraint language and go direct 
> to SQL.

In addition, we provide defined ways for a server to indicate
that there are additional ways to query the server.

> None of these really lit into the DAS paradigm. I'm guessing you want 
> something simple that can be used as easily as an API with get-by-X 
> methods but will seamlessly blend into something more powerful. I 
> think what you have is on the right lines. I'm just arguing to make 
> this language composable from the outset, so that it can be extended 
> to whatever expressivity is required in the future, without bolting on 
> a new query system that's incompatible with the existing one.

We have two ways to compose the system.  If the simple query language
is extended, for example, to support word searches of the text field
instead of substring searches, then a server can say

<CAPABILITY type="features" 
query_uri="http://somewhere.over.rainbow/server.cgi">
   <SUPPORTS name="word-search"/>
</CAPABILITY>

This is backwards compatible, so the normal DAS queries work.  But
a client can recognize the new feature and support whatever new filters
that 'word-search' indicates, eg

   http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*

(finds features with notes containing words starting with 'Andre' )

These are composable.  For example, suppose Sanger allows modification
date searches of curation events.  Then it might say

<CAPABILITY type="features" 
query_uri="http://somewhere.over.rainbow/server.cgi">
   <SUPPORTS name="word-search"/>
   <SUPPORTS name="sanger-curation"/>
</CAPABILITY>

and I can search for notes containing words starting with "Andre"
which were modified by "dalke" between 2002 and 2005 by doing

   http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*&
        modified-by=dalke&modified-before=2005&modified-after=2002


An advantage to the simple boolean logic of the current system
is that the GUI interface is easy, and in line with existing
simple search systems.


If someone wants to implement a new search system which is
not backwards compatible then the server can indicate that
alternative with a new CAPABILITY.  Suppose Thomas at Sanger
comes up with a new search mechanism based on an object query
language he invented,

<CAPABILITY type="down-oql"
     query_uri="http://sanger.ac.uk/oql-search" />

The Sanger and EBI clients might understand that and support
a more complex GUI, eg, with a text box interface.  Everyone
else must ignore unknown capability types.

Then that would be POSTED (or whatever the protocol defines)
to the given URL, which returns back whatever results are
desired.

Or the server can point to a public MySQL port, like

<CAPABILITY type="mysql-connection"
     query_uri="mysql://username:password@hostname:port/databasename" />

That's what you are doing to bypass the syntax, except that
here it isn't a bypass; you can define the new interface in
the DAS sources document.

> The generic language could just be some kind of simple
> extensible function syntax for search terms, boolean operators,
> and some kind of (optional) nesting syntax.

Which syntax?  Is it supposed to be easy for people to write?
Text oriented?  Or tree structured, like XML, or SQL-like?
And which clients and servers will implement that search
language?

If there was a generic language it would allow
   OR("on segment Chr1 between 1000 and 2000",
      "on segment ChrX between 99 and 777")
which is something we are expressly not allowing in DAS2
queries.  It doesn't make sense for the target applications
and by excluding it it simplifies the server development,
which means less chance for bugs.

Also, I personally haven't figured out a decent way to
do a GUI composition of a complex boolean query which is
as easy as learning the query language in the first place.

A more generic language implementation is a lot of overhead
if most (80%? 90%) need basic searches, and many of the
rest can fake it by breaking a request into parts and
doing the boolean logic on the client side.

Feedback I've heard so far is that DAS1 queries were
acceptable, with only a few new search fields needed.

> hmm, not sure how useful this would be - surely you'd want something
> more dasmodel-aware?

The example I gave was a bad one.  What I meant was to show
how there's an extension point so someone can develop a new
search interface and clients can know that the new functionality
exists, without having to change the DAS spec.

					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list