properties and key/value data (was Re: [DAS2] Spec issues)

Steve Chervitz Steve_Chervitz at affymetrix.com
Mon Nov 28 22:07:29 UTC 2005


To give some context to the message that Andrew recently forwarded to the
list, below is the message I sent to Andrew that prompted his reply (I also
meant to send to the list instead of to just Andrew).

It contains my fix to the 'namespace in attribute values' problem regarding
properties which I mentioned in today's conf call, and is, I believe, the
only viable alternative to Andrew's relax-NG based solution.

Basically, the trick is to enclose PROP elements that are relative to the
same xml:base within a parent PROPERTIES element and then permit multiple
PROPERTIES elements within a feature. This way you can allow property
attribute URIs that are relative to different xml:bases.

To clarify a point of possible confusion, there are really two sets of
key-value pairs to keep in mind:

1. The key-value pair for the property type.
2. The key-value pair for the property itself.

So in this example:

  <PROP das:ptype="property/genefinder-score">29</PROP>

The key for the type is 'das:ptype' and it's value is
'property/genefinder-score' and this value is a relative URL based on
xml:base in the enclosing PROPERTIES element (or in it's grandparent or
great-grandparent element, etc.). The value of the property itself is 29 and
it's key is the whole key-value pair for the type (
das:ptype="property/genefinder-score").

In Andrew's Relax-NG equivalent:

  <prop:genefinder-score>29</score>

the element name contains both the key ('prop:') and the value of the
property type ('genefinder-score'), while the element name as a whole serves
as the key for the property itself (value=29). The 'prop:genefinder-score'
string is not a relative URL, but is just a namespace-scoped element name,
with 'prop:' serving merely to make 'genefinder-score' globally unique,
relative to the URI defined by:

  xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"

A potential drawback of the Relax-NG approach, as discussed in today's conf
call, is that the value of the property type is not resolvable as in the
other approach using the PROPERTIES parent element.

Andrew doesn't see a need for resolvability, e.g., for a dynamically
discoverable schema fragment. But I thought of another use case besides the
one mentioned in today's call (determining data type such as int or float,
which isn't of much use in practice). The URL for the type could point to a
human readable definition of the term. A user may not need clarification of
'genefinder-score' but might for something like 'softberry-ztuple'.

One could still satisfy such a use case under the Relax-NG approach by
providing a resolvable URL based on the element name + namespace such as:

http://www.biodas.org/ns/das/genome/2.00/properties#genefinder-score

True, there's no XML spec that says this is legal, but we could declare that
such a convention will hold for all biodas.org-based properties. One problem
with the above convention is that it's not obvious what the URL resolves to.
So we could have something like:

http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder-score&de
fine=true

http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder-score&sc
hema=true

Just a thought.

Steve 

> From: Steve Chervitz <Steve_Chervitz at affymetrix.com>
> Date: Mon, 14 Nov 2005 17:40:28 -0800
> To: Andrew Dalke <dalke at dalkescientific.com>
> Conversation: [DAS2] Spec issues
> Subject: Re: [DAS2] Spec issues
> 
> 
> Andrew Dalke <dalke at dalkescientific.com> wrote on 14 Nov 2005:
>> 
>> To: DAS/2 <das2 at portal.open-bio.org>
>> Subject: Re: [DAS2] Spec issues
>> 
>> On Nov 4 Steve wrote:
>>>     <FEATURE das:id="feature/cTel54X.1.2"
>>>              das:type="type/curated_exon">
>>>       <PROP das:ptype="property/genefinder-score">29</PROP>
>>>       <PROP das:ptype="das:prop#phase">2</PROP>
>>>       <PROP das:ptype="das:prop#protein_translation"
>>>             xlink:type="simple"
>>>    
>>> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
>>> CTEL54X.1
>>> />
>>>     </FEATURE>
>> 
>> I think we're missing something.  This is XML.  We can do
>> 
>> <TYPES xml:base="http://www.wormbase.org/dase/genome/volvox/1/type">
>>    <TYPE id="curated_gene"
>>            ontology="http://song.sf.net/ontologies/sofa#gene"
>>            source="curated"
>>            xml:base="gene/">
>>      <das:ptype name="property/genefinder-score">29</das:ptype>
>>      <das:phase>2</das:phase>
>>      <das:protein_translation xlink:type="simple"
>> xlink:href="http://www.wormbase.org/..." />
>>      <xyz:ack type="html">This message brought to you by
>> AT&amp;T</xyz:ack>
>>    </TYPE
>> </TYPES>
>> 
>> The whole point of having namespaces in XML is to keep from needing
>> to define new namespaces like <PROP>.
>> 
>> In doing that, there's no problem in supporting things like "bg:glyph",
>> etc. because the values are expanded as expected by the XML processor.
> 
> Interesting, but a problem with this is that it effectively creates a
> new version of the TYPES schema every time a new property is added to
> the DAS properties controlled vocabulary. I would hope for a solution
> that decouples the content of the controlled vocab from the data
> exchange format.
> 
> Here's my next attempt, which more fully exploits xml:base to achieve
> this decoupling:
> 
>   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>             xmlns:xlink="http://www.w3.org/1999/xlink"
>> 
>     <FEATURE das:id="feature/cTel54X.1.2"
>              das:type="type/curated_exon">
>       <PROPERTIES>
>         <PROP das:ptype="property/genefinder-score">29</PROP>
>       </PROPERTIES>
>       <PROPERTIES
> xml:base="http://www.biodas.org/ns/das/genome/2.00/properties">
>         <PROP das:ptype="phase">2</PROP>
>         <PROP das:ptype="protein_translation"
>               xlink:type="simple"
>               
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1" />
>       </PROPERTIES>
>     </FEATURE>
> 
> So now we have the following arrangement:
> 
>  * the attribute keys 'das:id', 'das:type', and 'das:ptype' are defined
>    within the xmlns:das namespace (i.e., the full id of 'das:type' is
>    derived by appending 'type' to the xmlns:das URL).
> 
>  * the attributes values of 'das:id', 'das:type', and 'das:ptype' are
>    URLs relative to xml:base.
> 
>  * The FEATURE element may contain zero or more PROPERTIES
>    sub-elements, each with it's own xml:base attribute, effectively
>    changing what xml:base is used within the containted PROP
>    sub-elements. 
> 
> So in this example, the property 'das:ptype="property/genefinder-score"'
> inherits its xml:base from its grandparent FEATURES element and so
> expands to: 
> 
> http://www.wormbase.org/das/genome/volvox/1/property/genefinder-score
> 
> while the 'das:ptype="phase"' and 'das:ptype="protein_translation"'
> properties inherit xml:base from their PROPERTIES parent element and
> so expand to:
> 
> http://www.biodas.org/ns/das/genome/2.00/properties/phase
> http://www.biodas.org/ns/das/genome/2.00/properties/protein_translation
> 
> 
>>> Also, we might want to allow some controlled vocabulary terms to be
>>> used for
>>> the value of type.source (e.g., "das:curated"), to ensure that
>>> different
>>> users use the same term to specify that a feature type is produced by
>>> curation.
>> 
>> I talked with Andreas Prlic about what other metadata is needed for the
>> registry system.  He mentioned
>> 
>>      Together with the BioSapiens DAS people we recently decided that
>>      there should be the possibility to assign gene-ontology evidence
>>      codes to each das source, so in the next update of the registry,
>>      this will be changed.
>> 
>> That's at the source level, but perhaps it's also needed at the
>> annotation level.
> 
> I like this idea. Good re-use of GO technology.
>  
>> <snip>
>> 
>> My thoughts on these are:
>>    - come up with a more consistent way to store key/value data
>>    - the Atom spec has a nice way to say "the data is in this CDATA
>> as text/html/xml" vs. "this text is over there".  I want to copy its
>> way of doing things.
>> 
>>    - I'm still not clear about xlink.  Another is the HTML-style
>> <link href="http://..." rel="...">
>> 
>> Atom uses the "rel=" to encoding information about the link.  For
>> example, the URL to edit a given document is
>> 
>>    <link ... rel="service.edit">
>> 
>> See http://atomenabled.org/developers/api/atom-api-spec.php
> 
> Not sure about this one yet. In the Atom API, the value of the rel
> attribute is restricted to a controlled vocabulary of link
> relationships and available services pertaining to editing and
> publishing syndicated content on the web:
> http://atomenabled.org/developers/api/atom-api-spec.php#rfc.section.5.4.1
> 
> What would a controlled vocab for DAS resources be?
> 
> Skimming through the DAS/2 retrieval spec, our use of hrefs is
> simply for pointing at the location of resources on the web
> containing some specified content (e.g., documentation, database
> entry, image data, etc.).
> 
> The next/prev/start idea for Atom might have good applicability in the
> DAS world for iterating through versions of annotations or assemblies
> (e.g., rel='link-to-gene-on-next-version-of-genome'). One relationship
> that would be useful for DAS would be 'latest', to get the latest
> version of an annotation.
> 
> DAS get URLs themselves seem fairly self-documenting (it's clear a
> given link is for feature, type, or sequence for example), so having a
> separate rel attribute may not provide much additional value for these
> links. But it might be handy for versioning and for DAS/2 writebacks.
> 
> Here's another link about Atom:
> http://en.wikipedia.org/wiki/Atom_%28standard%29
> 
> Steve




More information about the DAS2 mailing list