[DAS2] properties

Andrew Dalke dalke at dalkescientific.com
Mon Jan 23 13:34:03 UTC 2006


Thomas wrote:

> This seems to be inventing a new namespace-management mechanism.  I'm  
> always a bit nervous about schemes which require someone to maintain a  
> registry of namespaces.  Was there really a problem with the old  
> system of using the XML document's namespace map to scope property  
> names?

I don't have a good feeling about any solution to this.  It comes
down to what people will be using these property tables for.

Are they only meant for machines?  For people?  For both?

Is it okay to restrict the key names?  For example, will
there be keys like "5'UTR"?  If so, we'll need to work around
the XML syntax restrictions on an element name.  Oh, and
can non-ASCII/non-Latin1 unicode characters be included
in key names?

Will people be able to add their own properties?  Eg, an
editable table interface to add "curator" "Andrew Dalke",
assuming no curator element is already defined.

Searching is one problem.  The syntax for doing the search
is only part of the problem.  Different fields may need
different search types.  Eg, "starts with" vs. "contains"
vs. "range" searches.

Most of the properties are stylesheet related.  This made me
think about HTML stylesheets, where old HTML had elements
like "font" and "color" but modern ones put the style data
elsewhere.  The style data is very well defined, and not
done through URLs.

What if we did the same?  Then fields like 'bgcolor', 'glyph',
etc. would indeed be in a completely different namespace.

Of course then HTML has the "style" attribute to override
that, but through an embedded stylesheet.

I honestly don't know.  This is something I hope can be
worked on during the sprint in a couple weeks.

> The one thing I saw missing from the old spec was a mechanism for  
> doing namespace-scoped property queries.  I guess shoehorning this  
> into an application/x-www-form-urlencoded query isn't ideal, but it  
> could be done.
>  How about
>
>   
> feature?xmlns: 
> bg=http:%2f%2fwww.bioperl.org%2fbiographics%2fproperties;
> att=bg:fgcolor:red

Yeah, I'm shaky on that as well.  The solutions I came up
with with are:

  - your approach

  - Clark names, like
     att={http:%2f%2fwww.bioperl.org%2fbiographics%2fproperties}fgcolor

  - "aliases"

Let me explain the last as I'm going to propose it for the
sequence data.

In sequence data there are URLs
   http://example.org/foo/bar/residues/Chr1
   http://example.org/foo/bar/residues/Chr2
       ...

These are fixed and finite.  I propose that a request of

   http://example.org/foo/bar/residues

returns a document like this

<RESIDUES_LIST>
   <RESIDUES id="residues/Chr1" name="Chr1" length="12345" />
   <RESIDUES id="residues/Chr2" name="Chr2" length="89898" />
</RESIDUES_LIST>

To query for features on a given sequence use its 'name'
in the query,

     overlaps=Chr3/1000:2000

This works for sequences because there's no chance of
conflicting names.  All annotation servers refer to the
same sequence data set.  I'm less sure about it's feasibility
for generic properties.  If we do that we might as well just
use the short names and not have full URLs.

Here's my view.

1)

We have a large number of stylesheet properties.  We must
support these.  They keys come from a restricted vocabulary,
which must be machine readable and well-defined.  They do
not need to be searchable (who will search for all "red"
elements, especially if the client can apply its own
stylesheet definitions?)

These can be DAS elements,
   <das:fgcolor>red</das:fgcolor>
or given the general aversion to cdata from many of
the DAS people
   <das:fgcolor value="red"</das:fgcolor>

More specifically, elements which serve only to override
some other as of yet undefined stylesheet document.

2)

Some properties have (approximately) arbitrary contents
for the key and value.  Searches will be done as .. some
server-specific method? (Probably substring, exact or
word match.  Should we require a specific type of search?)

These will tend to be software or group specific.  Eg,
"last edited by" "Andrew".

I think these are best done as a key/value table, with no
restrictions on the key or value.

3)

Some properties are in the middle.  For example, scores.
They need to be well defined.  But score searches should be
range searches (scores better than 10E-4, more than 80%
identifies).

I don't know how to handle these.

4)

opaque extensions.  These are outside the realm of key/value
data, and used mostly as extension elements.  These can be
done through allowing non-DAS elements in part of the returned
document.  Eg,

   <dalke:link type="text/html" href="http://example.com/"
      mouseover="More information" glyph="http://example.com/glyph/"
      xmlns:dalke="http://www.dalkescientific.com/" />

					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list