properties and key/value data (was Re: [DAS2] Spec issues)

Andrew Dalke dalke at dalkescientific.com
Mon Nov 28 18:09:17 UTC 2005


Here's the email I sent to Steve that I meant to send to everyone.

On Nov 17, 2005, at 2:09 AM, Andrew Dalke wrote:

> I think I understand the Atom spec better now.  In brief, the
> Atom document contains sections which are extensible and sections
> which are not.
>
> In an extensible section there are two/three categories of elements:
>   - those in the "atom:" namespace
>   - "simple extension elements" not in the "atom:" namespace
>   - "structured extension elements" not in the "atom:" namespace.
>
> Most of the "atom:" elements share a common structure.  For example:
>   - the type= attribute indicates of the contents are text, escaped
>       HTML or XHTML; or an explicit content-type like "chemical/x-pdb".
>
>   - the src= attribute indicates that the content of the element is
>       empty and to go to the given URL instead (apparently the hip
>       term for URL these days is IRL - internationalized Resource  
> Identifiers.
>       I think we only need to use URLs)
>
>
> These are not always used for all elements; if it's appropriate for a
> given field then it's used.
>
>
>  Simple extension elements are always of the form
>     <element>Content goes here</element>
> where 'element' is not part of the 'atom:' namespace.  Consumers of
> this data may treat it as simple key/value data.
>
>  Structured extension elements always have at least an attribute
> or a sub-element, so must look like
>   <element attr="xyz"> .. </element>
> -or-
>   <element> .. <subelement /> .. </element>
>
> If the element isn't known this field may be ignored.
>
> These three things provide for:
>   - a set of well-define elements, understandable by everyone
>   - a simple extension for things which can be key/value data
>   - a way to store or refer to more complex data types
>
>
> Steve, responding to an earlier posting of mine:
>> Interesting, but a problem with this is that it effectively creates a
>> new version of the TYPES schema every time a new property is added to
>> the DAS properties controlled vocabulary. I would hope for a solution
>> that decouples the content of the controlled vocab from the data
>> exchange format.
>
> I looked into that.  Relax-NG lets you define a "can be anything
> except ...".  The Atom spec is defined with the following
>
> # Simple Extension
>
> simpleExtensionElement =
>    element * - atom:* {
>       text
>    }
>
> # Structured Extension
>
> structuredExtensionElement =
>    element * - atom:* {
>       (attribute * { text }+,
>          (text|anyElement)*)
>     | (attribute * { text }*,
>        (text?, anyElement+, (text|anyElement)*))
>    }
>
> The "element * - atom:*" means "Any element except those in
> the atom namespace."
>
> Thus we can validate anything with DAS/2 tags, and ignore
> validate of anything not part of DAS/2.  And we can say that
> extensions are only allowed in certain parts of the spec and
> not in others.
>
> We would need to update the schema when we add new "das:" elements,
> but we already need to do that.
>
> We wouldn't need to change the schema to allow others to develop
> their own extensions. Indeed, the schema would still let use
> verify that extensions are still well-formed.
>
>> Here's my next attempt, which more fully exploits xml:base to achieve
>> this decoupling:
>>
>>   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>>             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>>             xmlns:xlink="http://www.w3.org/1999/xlink"
>>>
>>     <FEATURE das:id="feature/cTel54X.1.2"
>>              das:type="type/curated_exon">
>>       <PROPERTIES>
>>         <PROP das:ptype="property/genefinder-score">29</PROP>
>>       </PROPERTIES>
>>       <PROPERTIES
>> xml:base="http://www.biodas.org/ns/das/genome/2.00/properties">
>>         <PROP das:ptype="phase">2</PROP>
>>         <PROP das:ptype="protein_translation"
>>               xlink:type="simple"
>>
>> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
>> CTEL54X.1"
>> />
>>       </PROPERTIES>
>>     </FEATURE>
>
> Vs.
>
> <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>              
> xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"
>           xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>           xmlns:xlink="http://www.w3.org/1999/xlink">
>   <FEATURE id="feature/xTel54X.1.2"
>            das:type="type/curated_exon">
>      <prop:genefinder-score>29</score>
>      <prop:phase>2</phase>
>      <prop:protein_translation
>   src="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1"  
> />
>   </FEATURE>
> </FEATURES>
>
> The main differences are:
>   - the properties are defined elements in the prop: namespace (though
>       I think they can just as easily be in the das: namespace)
>
>   - I'm using lower-case since that seems to be the trend these days.
>
>
>
>> So now we have the following arrangement:
>>
>>  * the attribute keys 'das:id', 'das:type', and 'das:ptype' are  
>> defined
>>    within the xmlns:das namespace (i.e., the full id of 'das:type' is
>>    derived by appending 'type' to the xmlns:das URL).
>
> I don't follow why the attributes have full namespaces.  Is that
> to allow extensibility of element attribute on a per-element basis?
>
> I kept "das:type" above because "type" already has too many meanings.
>
>>  * the attributes values of 'das:id', 'das:type', and 'das:ptype' are
>>    URLs relative to xml:base.
>
> Are all attribute values relative to xml:base or only those three?
>
> Are xlink:href fields relative to xml:base as well?  I assume "yes".
>
>>  * The FEATURE element may contain zero or more PROPERTIES
>>    sub-elements, each with it's own xml:base attribute, effectively
>>    changing what xml:base is used within the containted PROP
>>    sub-elements.
>>
>> So in this example, the property  
>> 'das:ptype="property/genefinder-score"'
>> inherits its xml:base from its grandparent FEATURES element and so
>> expands to:
>>
>> http://www.wormbase.org/das/genome/volvox/1/property/genefinder-score
>>
>> while the 'das:ptype="phase"' and 'das:ptype="protein_translation"'
>> properties inherit xml:base from their PROPERTIES parent element and
>> so expand to:
>>
>> http://www.biodas.org/ns/das/genome/2.00/properties/phase
>> http://www.biodas.org/ns/das/genome/2.00/properties/ 
>> protein_translation
>
> This is also what happens with the "prop:" namespaced elements, just
> at the element level instead of the attribute level.
>
> To keep this on key/value data I've shifted the rest of the reply
> to the next email.

					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list