[DAS2] query language description

Lincoln Stein lstein at cshl.edu
Mon Mar 20 17:32:40 UTC 2006


The current filter query language, which provides one level of ANDs and a 
nested level of ORs, satisfies our use cases. It is not clear to me what 
additional benefit we'll get from a composable query language. Note that none 
of the popular and functional genome information sources -- NCBI, UCSC, 
Ensembl or BioMart -- offer a composable query language, and there does not 
seem to be rioting on the streets!

Lincoln


On Friday 17 March 2006 19:20, chris mungall wrote:
> On Mar 16, 2006, at 6:05 PM, Andrew Dalke wrote:
> >> right now they are forced bypass the constraint language and go direct
> >> to SQL.
> >
> > In addition, we provide defined ways for a server to indicate
> > that there are additional ways to query the server.
>
> I was positing this as a bad feature, not a good one. or at least a
> symptom of an incorrectly designed system (at least in the case of the
> GO DB API - it may not carry forward to DAS - though if you're going to
> allow querying by terms...)
>
> >> None of these really lit into the DAS paradigm. I'm guessing you want
> >> something simple that can be used as easily as an API with get-by-X
> >> methods but will seamlessly blend into something more powerful. I
> >> think what you have is on the right lines. I'm just arguing to make
> >> this language composable from the outset, so that it can be extended
> >> to whatever expressivity is required in the future, without bolting on
> >> a new query system that's incompatible with the existing one.
> >
> > We have two ways to compose the system.  If the simple query language
> > is extended, for example, to support word searches of the text field
> > instead of substring searches, then a server can say
> >
> > <CAPABILITY type="features"
> > query_uri="http://somewhere.over.rainbow/server.cgi">
> >    <SUPPORTS name="word-search"/>
> > </CAPABILITY>
> >
> > This is backwards compatible, so the normal DAS queries work.  But
> > a client can recognize the new feature and support whatever new filters
> > that 'word-search' indicates, eg
> >
> >    http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*
> >
> > (finds features with notes containing words starting with 'Andre' )
> >
> > These are composable.  For example, suppose Sanger allows modification
> > date searches of curation events.  Then it might say
> >
> > <CAPABILITY type="features"
> > query_uri="http://somewhere.over.rainbow/server.cgi">
> >    <SUPPORTS name="word-search"/>
> >    <SUPPORTS name="sanger-curation"/>
> > </CAPABILITY>
>
> so this is limited to single-argument search functions?
>
> > and I can search for notes containing words starting with "Andre"
> > which were modified by "dalke" between 2002 and 2005 by doing
> >
> >    http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*&
> >         modified-by=dalke&modified-before=2005&modified-after=2002
>
> but the compositionality is always associative since the CGI parameter
> constraint forbids nesting
>
> > An advantage to the simple boolean logic of the current system
> > is that the GUI interface is easy, and in line with existing
> > simple search systems.
>
> there's nothing preventing you from implementing a simple GUI on top of
> an expressive system - there is nothing forcing you to use the
> expressivity
>
> > If someone wants to implement a new search system which is
> > not backwards compatible then the server can indicate that
> > alternative with a new CAPABILITY.  Suppose Thomas at Sanger
> > comes up with a new search mechanism based on an object query
> > language he invented,
> >
> > <CAPABILITY type="down-oql"
> >      query_uri="http://sanger.ac.uk/oql-search" />
> >
> > The Sanger and EBI clients might understand that and support
> > a more complex GUI, eg, with a text box interface.  Everyone
> > else must ignore unknown capability types.
>
> but this doesn't integrate with the existing query system
>
> > Then that would be POSTED (or whatever the protocol defines)
> > to the given URL, which returns back whatever results are
> > desired.
> >
> > Or the server can point to a public MySQL port, like
> >
> > <CAPABILITY type="mysql-connection"
> >      query_uri="mysql://username:password@hostname:port/databasename"
> > />
> >
> > That's what you are doing to bypass the syntax, except that
> > here it isn't a bypass; you can define the new interface in
> > the DAS sources document.
> >
> >> The generic language could just be some kind of simple
> >> extensible function syntax for search terms, boolean operators,
> >> and some kind of (optional) nesting syntax.
> >
> > Which syntax?  Is it supposed to be easy for people to write?
> > Text oriented?  Or tree structured, like XML, or SQL-like?
>
> I'd favour some concrete asbtract syntax that looks much like the
> existing DAS QL
>
> > And which clients and servers will implement that search
> > language?
>
> all servers. clients optional
>
> > If there was a generic language it would allow
> >    OR("on segment Chr1 between 1000 and 2000",
> >       "on segment ChrX between 99 and 777")
> > which is something we are expressly not allowing in DAS2
> > queries.  It doesn't make sense for the target applications
> > and by excluding it it simplifies the server development,
> > which means less chance for bugs.
>
> this example is pointless but it's easy to imagine plenty of ontology
> term queries or other queries in which this would be useful
>
> I guess I depart from the normal DAS philosophy - I don't see this
> being a high barrier for entry for servers, if they're not up to this
> it'll probably be a buggy hacky server anyway
>
> > Also, I personally haven't figured out a decent way to
> > do a GUI composition of a complex boolean query which is
> > as easy as learning the query language in the first place.
>
> doesn't mean it doesn't exist.
>
> i'm not sure what's hard about having say, a clipboard of favourite
> queries, then allowing some kind of drag-and-drop composition
>
> > A more generic language implementation is a lot of overhead
> > if most (80%? 90%) need basic searches, and many of the
> > rest can fake it by breaking a request into parts and
> > doing the boolean logic on the client side.
>
> this is always an option - if the user doesn't mind the additional
> possibly very high overhead. it's just a little bit of a depressing
> approach, as if Codd's seminal paper from 1970 or whenever it was never
> happened.
>
> > Feedback I've heard so far is that DAS1 queries were
> > acceptable, with only a few new search fields needed.
> >
> >> hmm, not sure how useful this would be - surely you'd want something
> >> more dasmodel-aware?
> >
> > The example I gave was a bad one.  What I meant was to show
> > how there's an extension point so someone can develop a new
> > search interface and clients can know that the new functionality
> > exists, without having to change the DAS spec.
>
> ok
>
> that's probably all I've got to say on the matter, sorry for being
> irksome. I guess I'm fundamentally missing something, that is, why wrap
> simple and expressive declarative query languages with limited ad-hoc
> constraint systems with consciously limited expressivity and limited
> means of extensibility..
>
> cheers
> chris
>
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/das2
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008)



More information about the DAS2 mailing list