[DAS2] query language description

chris mungall cjm at fruitfly.org
Mon Mar 20 23:45:46 UTC 2006

I guess things need to be left open for a DAS/3...

On Mar 20, 2006, at 9:32 AM, Lincoln Stein wrote:

> The current filter query language, which provides one level of ANDs 
> and a
> nested level of ORs, satisfies our use cases. It is not clear to me 
> what
> additional benefit we'll get from a composable query language. Note 
> that none
> of the popular and functional genome information sources -- NCBI, UCSC,
> Ensembl or BioMart -- offer a composable query language, and there 
> does not
> seem to be rioting on the streets!
> Lincoln
> On Friday 17 March 2006 19:20, chris mungall wrote:
>> On Mar 16, 2006, at 6:05 PM, Andrew Dalke wrote:
>>>> right now they are forced bypass the constraint language and go 
>>>> direct
>>>> to SQL.
>>> In addition, we provide defined ways for a server to indicate
>>> that there are additional ways to query the server.
>> I was positing this as a bad feature, not a good one. or at least a
>> symptom of an incorrectly designed system (at least in the case of the
>> GO DB API - it may not carry forward to DAS - though if you're going 
>> to
>> allow querying by terms...)
>>>> None of these really lit into the DAS paradigm. I'm guessing you 
>>>> want
>>>> something simple that can be used as easily as an API with get-by-X
>>>> methods but will seamlessly blend into something more powerful. I
>>>> think what you have is on the right lines. I'm just arguing to make
>>>> this language composable from the outset, so that it can be extended
>>>> to whatever expressivity is required in the future, without bolting 
>>>> on
>>>> a new query system that's incompatible with the existing one.
>>> We have two ways to compose the system.  If the simple query language
>>> is extended, for example, to support word searches of the text field
>>> instead of substring searches, then a server can say
>>> <CAPABILITY type="features"
>>> query_uri="http://somewhere.over.rainbow/server.cgi">
>>>    <SUPPORTS name="word-search"/>
>>> This is backwards compatible, so the normal DAS queries work.  But
>>> a client can recognize the new feature and support whatever new 
>>> filters
>>> that 'word-search' indicates, eg
>>>    http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*
>>> (finds features with notes containing words starting with 'Andre' )
>>> These are composable.  For example, suppose Sanger allows 
>>> modification
>>> date searches of curation events.  Then it might say
>>> <CAPABILITY type="features"
>>> query_uri="http://somewhere.over.rainbow/server.cgi">
>>>    <SUPPORTS name="word-search"/>
>>>    <SUPPORTS name="sanger-curation"/>
>> so this is limited to single-argument search functions?
>>> and I can search for notes containing words starting with "Andre"
>>> which were modified by "dalke" between 2002 and 2005 by doing
>>>    http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*&
>>>         modified-by=dalke&modified-before=2005&modified-after=2002
>> but the compositionality is always associative since the CGI parameter
>> constraint forbids nesting
>>> An advantage to the simple boolean logic of the current system
>>> is that the GUI interface is easy, and in line with existing
>>> simple search systems.
>> there's nothing preventing you from implementing a simple GUI on top 
>> of
>> an expressive system - there is nothing forcing you to use the
>> expressivity
>>> If someone wants to implement a new search system which is
>>> not backwards compatible then the server can indicate that
>>> alternative with a new CAPABILITY.  Suppose Thomas at Sanger
>>> comes up with a new search mechanism based on an object query
>>> language he invented,
>>> <CAPABILITY type="down-oql"
>>>      query_uri="http://sanger.ac.uk/oql-search" />
>>> The Sanger and EBI clients might understand that and support
>>> a more complex GUI, eg, with a text box interface.  Everyone
>>> else must ignore unknown capability types.
>> but this doesn't integrate with the existing query system
>>> Then that would be POSTED (or whatever the protocol defines)
>>> to the given URL, which returns back whatever results are
>>> desired.
>>> Or the server can point to a public MySQL port, like
>>> <CAPABILITY type="mysql-connection"
>>>      query_uri="mysql://username:password@hostname:port/databasename"
>>> />
>>> That's what you are doing to bypass the syntax, except that
>>> here it isn't a bypass; you can define the new interface in
>>> the DAS sources document.
>>>> The generic language could just be some kind of simple
>>>> extensible function syntax for search terms, boolean operators,
>>>> and some kind of (optional) nesting syntax.
>>> Which syntax?  Is it supposed to be easy for people to write?
>>> Text oriented?  Or tree structured, like XML, or SQL-like?
>> I'd favour some concrete asbtract syntax that looks much like the
>> existing DAS QL
>>> And which clients and servers will implement that search
>>> language?
>> all servers. clients optional
>>> If there was a generic language it would allow
>>>    OR("on segment Chr1 between 1000 and 2000",
>>>       "on segment ChrX between 99 and 777")
>>> which is something we are expressly not allowing in DAS2
>>> queries.  It doesn't make sense for the target applications
>>> and by excluding it it simplifies the server development,
>>> which means less chance for bugs.
>> this example is pointless but it's easy to imagine plenty of ontology
>> term queries or other queries in which this would be useful
>> I guess I depart from the normal DAS philosophy - I don't see this
>> being a high barrier for entry for servers, if they're not up to this
>> it'll probably be a buggy hacky server anyway
>>> Also, I personally haven't figured out a decent way to
>>> do a GUI composition of a complex boolean query which is
>>> as easy as learning the query language in the first place.
>> doesn't mean it doesn't exist.
>> i'm not sure what's hard about having say, a clipboard of favourite
>> queries, then allowing some kind of drag-and-drop composition
>>> A more generic language implementation is a lot of overhead
>>> if most (80%? 90%) need basic searches, and many of the
>>> rest can fake it by breaking a request into parts and
>>> doing the boolean logic on the client side.
>> this is always an option - if the user doesn't mind the additional
>> possibly very high overhead. it's just a little bit of a depressing
>> approach, as if Codd's seminal paper from 1970 or whenever it was 
>> never
>> happened.
>>> Feedback I've heard so far is that DAS1 queries were
>>> acceptable, with only a few new search fields needed.
>>>> hmm, not sure how useful this would be - surely you'd want something
>>>> more dasmodel-aware?
>>> The example I gave was a bad one.  What I meant was to show
>>> how there's an extension point so someone can develop a new
>>> search interface and clients can know that the new functionality
>>> exists, without having to change the DAS spec.
>> ok
>> that's probably all I've got to say on the matter, sorry for being
>> irksome. I guess I'm fundamentally missing something, that is, why 
>> wrap
>> simple and expressive declarative query languages with limited ad-hoc
>> constraint systems with consciously limited expressivity and limited
>> means of extensibility..
>> cheers
>> chris
>>> 					Andrew
>>> 					dalke at dalkescientific.com
>>> _______________________________________________
>>> DAS2 mailing list
>>> DAS2 at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das2
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das2
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008)

More information about the DAS2 mailing list