[DAS2] query language description

Andrew Dalke dalke at dalkescientific.com
Fri Mar 17 04:47:58 UTC 2006


Updated:
   - added 'note' as a query field
   - changed string searches to substring (not word) searches
        and made them be case insensitive

       "AB" matches only the strings "AB", "Ab", "aB" and "ab"
       "*AB" matches only fields which exactly end with
               "AB", "ab", "aB", and "Ab"
       "AB*" matches only fields which exactly match, up to case
       "*AB*" matches only fields which contain the substring,
             up to case

   - added 'coordinates' search

   - removed 'type' and renamed 'exacttype' to 'type'

   - removed 'contains' search, which no one said they wanted.  Instead,
      supporting (EXPERIMENTAL) an 'excludes' search.



==================================

The query fields are

   name      |  takes | matches features ...
  ==========================
   xid       |  URI   | which have the given xid
   type      |  URI   | with exactly the given type
   segment   |  URI   | on the given segment
coordinates |  URI   | which are part of the given coordinate system
   overlaps  | region | which overlap the given region
   excludes  | region | which have no overlap to the given region
   inside    | region | which are contained inside the given region
   name      | string | with a title or alias which matches the given  
string
   note      | string | with a note which matches the given string
   prop-*    | string | with the property "*" matching the given string

Queries are form-urlencoded requests.  For example, if the features
query URL is 'http://biodas.org/features' and there is a segment named
'http://ncbi.org/human/Chr1' then the following is a request for all the
features on the first 10,000 bases of that segment

The query is for
     segment = 'http://ncbi.org/human/Chr1'
     overlaps = 0:10000

which is form-urlencoded as

    
http://biodas.org/features? 
segment=http%3A%2F%2Fncbi.org%2Fhuman%2FChr1;overlaps=0%3A1000

Multiple search terms with the same key are OR'ed together.  The  
following
searches for features containing the name or alias of either
BC048328 or BC015400

   http://biodas.org/features?name=BC048328;name=BC015400

The 'excludes' search is an exception.  See below.

Multiple search terms with different keys are AND'ed together,
but only after doing the OR search for each set of search terms with
identical keys.  The following searches for features which have
a name or alias of BC048328 or BC015400 and which are on the segment
http://ncbi.org/human/Chr1

    
http://biodas.org/features?name=BC048328; 
segment=http%3A%2F%2Fncbi.org%2Fhuman%2FChr1;name=BC015400

The order of the search terms in the query string does not affect
the results.

If any part of a complex feature (that is, one with parents
or parts) matches a search term then all of the parents and
parts are returned.  (XXX Gregg -- is this correct? XXX)



The fields which take URLs require exact matches, that is, a
character by character match.  (For details on the nuances of
comparing URIs see http://www.textuality.com/tag/uri-comp-3.html )

(We don't have an ontology URI yet, and when we do we can add
an 'ontology' query.)

The segment query filter takes a URI.  This must accept
the segment URI and, if known to the server, the equivalent
reference identifier for the segment.

If range searches are given then one and only one segment
must be given.  If there are multiple segment queries then
ranges are not allowed.

The string searches may be exact matches, substring, prefix
or suffix searches.  The query type depends on if the search
value starts and/or ends with a '*'.

     ABC  -- field exactly matches "ABC"
    *ABC  -- field ends with "ABC"
     ABC* -- field starts with "ABC"
    *ABC* -- field contains the substring "ABC"

The "*" has no special meaning except at the start or end
of the query value.  The search term "***" will match a
field which contains the character "*" anywhere.  There
is no way to match fields which exactly match '*' or
which only start or end with that character.

Text searches are case-insensitive.  The string "ABC"
matches "abc", "aBc", "ABC", etc.

A server may choose to collapse multiple whitespace
characters into a single space character for search purposes.
For example, the query "*a newline*" should match

   "This is a line of text which contains a
    newline"


The 'name' search does a text search of the 'title' and 'alias'
fields.


The "prop-*" is shorthand for a class of text searches of
<PROP> elements.  Features may have properties, like

    <PROP key="cellular_component" value="membrane" />

To do a string search for all 'membrane' cellular components,
construct the query key by taking  the string "prop-" and
appending the property key text ("cellular_component").  The
query value is the text to search for, in this case:

     prop-cellular_component=membrane

To search for any cellular_component containing the substring
"membrane"

     prop-cellular_component=*membrane*

The rules for multiple searches with the same key also apply to the
prop-* searches.  To search for all 'membrane' or 'nuclear'
cellular components, use two 'prop-cellular_component' terms, as

      
http://biodas.org/features?prop-cellular_component=membrane;prop- 
cellular_component=membrane



The range searches are defined with explicit start and end
coordinates.  The range syntax is in the form "start:end", for
example, "1:9".  There is no way to restrict the search to
a specific strand.

A feature may have several locations.  An annotation may
have several features in a parent/part relationship.  The
relationship may have several levels.  If a range search
matches any feature in the annotation then the search
returns all of the features in the annotation.

An 'overlaps' search matches if and only if any feature
location of any of the parent or part overlaps the query
range and segment.

An 'inside' search matches if and only if at least one
feature in the annotation has a location on the query segment
and all features which have a location on the query segment
have at least one location which starts and ends in the
query range.

EXPERIMENTAL: An 'excludes' matches if and only if at
least one feature of the annotation is on the query segment
and no features are in the query range.  This is the
complement of the 'overlaps' search, for annotations on
the same query segment.

Unlike the other search keys, if there multiple 'excludes'
searches then the results are AND'ed together.  That is,
if the query is has two excludes ranges
    segment=ChrX excludes=RANGE1 excludes=RANGE2
then the result are those features which on ChrX which
are not in RANGE1 and are not in RANGE2.


					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list