[DAS2] query language description

chris mungall cjm at fruitfly.org
Sat Mar 18 00:20:14 UTC 2006


On Mar 16, 2006, at 6:05 PM, Andrew Dalke wrote:

>> right now they are forced bypass the constraint language and go direct
>> to SQL.
>
> In addition, we provide defined ways for a server to indicate
> that there are additional ways to query the server.

I was positing this as a bad feature, not a good one. or at least a 
symptom of an incorrectly designed system (at least in the case of the 
GO DB API - it may not carry forward to DAS - though if you're going to 
allow querying by terms...)

>
>> None of these really lit into the DAS paradigm. I'm guessing you want
>> something simple that can be used as easily as an API with get-by-X
>> methods but will seamlessly blend into something more powerful. I
>> think what you have is on the right lines. I'm just arguing to make
>> this language composable from the outset, so that it can be extended
>> to whatever expressivity is required in the future, without bolting on
>> a new query system that's incompatible with the existing one.
>
> We have two ways to compose the system.  If the simple query language
> is extended, for example, to support word searches of the text field
> instead of substring searches, then a server can say
>
> <CAPABILITY type="features"
> query_uri="http://somewhere.over.rainbow/server.cgi">
>    <SUPPORTS name="word-search"/>
> </CAPABILITY>
>
> This is backwards compatible, so the normal DAS queries work.  But
> a client can recognize the new feature and support whatever new filters
> that 'word-search' indicates, eg
>
>    http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*
>
> (finds features with notes containing words starting with 'Andre' )
>
> These are composable.  For example, suppose Sanger allows modification
> date searches of curation events.  Then it might say
>
> <CAPABILITY type="features"
> query_uri="http://somewhere.over.rainbow/server.cgi">
>    <SUPPORTS name="word-search"/>
>    <SUPPORTS name="sanger-curation"/>
> </CAPABILITY>

so this is limited to single-argument search functions?

>
> and I can search for notes containing words starting with "Andre"
> which were modified by "dalke" between 2002 and 2005 by doing
>
>    http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*&
>         modified-by=dalke&modified-before=2005&modified-after=2002

but the compositionality is always associative since the CGI parameter 
constraint forbids nesting

> An advantage to the simple boolean logic of the current system
> is that the GUI interface is easy, and in line with existing
> simple search systems.

there's nothing preventing you from implementing a simple GUI on top of 
an expressive system - there is nothing forcing you to use the 
expressivity

> If someone wants to implement a new search system which is
> not backwards compatible then the server can indicate that
> alternative with a new CAPABILITY.  Suppose Thomas at Sanger
> comes up with a new search mechanism based on an object query
> language he invented,
>
> <CAPABILITY type="down-oql"
>      query_uri="http://sanger.ac.uk/oql-search" />
>
> The Sanger and EBI clients might understand that and support
> a more complex GUI, eg, with a text box interface.  Everyone
> else must ignore unknown capability types.

but this doesn't integrate with the existing query system

>
> Then that would be POSTED (or whatever the protocol defines)
> to the given URL, which returns back whatever results are
> desired.
>
> Or the server can point to a public MySQL port, like
>
> <CAPABILITY type="mysql-connection"
>      query_uri="mysql://username:password@hostname:port/databasename" 
> />
>
> That's what you are doing to bypass the syntax, except that
> here it isn't a bypass; you can define the new interface in
> the DAS sources document.
>
>> The generic language could just be some kind of simple
>> extensible function syntax for search terms, boolean operators,
>> and some kind of (optional) nesting syntax.
>
> Which syntax?  Is it supposed to be easy for people to write?
> Text oriented?  Or tree structured, like XML, or SQL-like?

I'd favour some concrete asbtract syntax that looks much like the 
existing DAS QL

> And which clients and servers will implement that search
> language?

all servers. clients optional

>
> If there was a generic language it would allow
>    OR("on segment Chr1 between 1000 and 2000",
>       "on segment ChrX between 99 and 777")
> which is something we are expressly not allowing in DAS2
> queries.  It doesn't make sense for the target applications
> and by excluding it it simplifies the server development,
> which means less chance for bugs.

this example is pointless but it's easy to imagine plenty of ontology 
term queries or other queries in which this would be useful

I guess I depart from the normal DAS philosophy - I don't see this 
being a high barrier for entry for servers, if they're not up to this 
it'll probably be a buggy hacky server anyway

> Also, I personally haven't figured out a decent way to
> do a GUI composition of a complex boolean query which is
> as easy as learning the query language in the first place.

doesn't mean it doesn't exist.

i'm not sure what's hard about having say, a clipboard of favourite 
queries, then allowing some kind of drag-and-drop composition

> A more generic language implementation is a lot of overhead
> if most (80%? 90%) need basic searches, and many of the
> rest can fake it by breaking a request into parts and
> doing the boolean logic on the client side.

this is always an option - if the user doesn't mind the additional 
possibly very high overhead. it's just a little bit of a depressing 
approach, as if Codd's seminal paper from 1970 or whenever it was never 
happened.

> Feedback I've heard so far is that DAS1 queries were
> acceptable, with only a few new search fields needed.
>
>> hmm, not sure how useful this would be - surely you'd want something
>> more dasmodel-aware?
>
> The example I gave was a bad one.  What I meant was to show
> how there's an extension point so someone can develop a new
> search interface and clients can know that the new functionality
> exists, without having to change the DAS spec.

ok

that's probably all I've got to say on the matter, sorry for being 
irksome. I guess I'm fundamentally missing something, that is, why wrap 
simple and expressive declarative query languages with limited ad-hoc 
constraint systems with consciously limited expressivity and limited 
means of extensibility..

cheers
chris

>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2




More information about the DAS2 mailing list