properties and key/value data (was Re: [DAS2] Spec issues)
Andrew Dalke
dalke at dalkescientific.com
Mon Nov 28 18:09:17 UTC 2005
Here's the email I sent to Steve that I meant to send to everyone.
On Nov 17, 2005, at 2:09 AM, Andrew Dalke wrote:
> I think I understand the Atom spec better now. In brief, the
> Atom document contains sections which are extensible and sections
> which are not.
>
> In an extensible section there are two/three categories of elements:
> - those in the "atom:" namespace
> - "simple extension elements" not in the "atom:" namespace
> - "structured extension elements" not in the "atom:" namespace.
>
> Most of the "atom:" elements share a common structure. For example:
> - the type= attribute indicates of the contents are text, escaped
> HTML or XHTML; or an explicit content-type like "chemical/x-pdb".
>
> - the src= attribute indicates that the content of the element is
> empty and to go to the given URL instead (apparently the hip
> term for URL these days is IRL - internationalized Resource
> Identifiers.
> I think we only need to use URLs)
>
>
> These are not always used for all elements; if it's appropriate for a
> given field then it's used.
>
>
> Simple extension elements are always of the form
> <element>Content goes here</element>
> where 'element' is not part of the 'atom:' namespace. Consumers of
> this data may treat it as simple key/value data.
>
> Structured extension elements always have at least an attribute
> or a sub-element, so must look like
> <element attr="xyz"> .. </element>
> -or-
> <element> .. <subelement /> .. </element>
>
> If the element isn't known this field may be ignored.
>
> These three things provide for:
> - a set of well-define elements, understandable by everyone
> - a simple extension for things which can be key/value data
> - a way to store or refer to more complex data types
>
>
> Steve, responding to an earlier posting of mine:
>> Interesting, but a problem with this is that it effectively creates a
>> new version of the TYPES schema every time a new property is added to
>> the DAS properties controlled vocabulary. I would hope for a solution
>> that decouples the content of the controlled vocab from the data
>> exchange format.
>
> I looked into that. Relax-NG lets you define a "can be anything
> except ...". The Atom spec is defined with the following
>
> # Simple Extension
>
> simpleExtensionElement =
> element * - atom:* {
> text
> }
>
> # Structured Extension
>
> structuredExtensionElement =
> element * - atom:* {
> (attribute * { text }+,
> (text|anyElement)*)
> | (attribute * { text }*,
> (text?, anyElement+, (text|anyElement)*))
> }
>
> The "element * - atom:*" means "Any element except those in
> the atom namespace."
>
> Thus we can validate anything with DAS/2 tags, and ignore
> validate of anything not part of DAS/2. And we can say that
> extensions are only allowed in certain parts of the spec and
> not in others.
>
> We would need to update the schema when we add new "das:" elements,
> but we already need to do that.
>
> We wouldn't need to change the schema to allow others to develop
> their own extensions. Indeed, the schema would still let use
> verify that extensions are still well-formed.
>
>> Here's my next attempt, which more fully exploits xml:base to achieve
>> this decoupling:
>>
>> <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>> xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>> xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>> xmlns:xlink="http://www.w3.org/1999/xlink"
>>>
>> <FEATURE das:id="feature/cTel54X.1.2"
>> das:type="type/curated_exon">
>> <PROPERTIES>
>> <PROP das:ptype="property/genefinder-score">29</PROP>
>> </PROPERTIES>
>> <PROPERTIES
>> xml:base="http://www.biodas.org/ns/das/genome/2.00/properties">
>> <PROP das:ptype="phase">2</PROP>
>> <PROP das:ptype="protein_translation"
>> xlink:type="simple"
>>
>> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
>> CTEL54X.1"
>> />
>> </PROPERTIES>
>> </FEATURE>
>
> Vs.
>
> <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
> xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>
> xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"
> xml:base="http://www.wormbase.org/das/genome/volvox/1/"
> xmlns:xlink="http://www.w3.org/1999/xlink">
> <FEATURE id="feature/xTel54X.1.2"
> das:type="type/curated_exon">
> <prop:genefinder-score>29</score>
> <prop:phase>2</phase>
> <prop:protein_translation
> src="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1"
> />
> </FEATURE>
> </FEATURES>
>
> The main differences are:
> - the properties are defined elements in the prop: namespace (though
> I think they can just as easily be in the das: namespace)
>
> - I'm using lower-case since that seems to be the trend these days.
>
>
>
>> So now we have the following arrangement:
>>
>> * the attribute keys 'das:id', 'das:type', and 'das:ptype' are
>> defined
>> within the xmlns:das namespace (i.e., the full id of 'das:type' is
>> derived by appending 'type' to the xmlns:das URL).
>
> I don't follow why the attributes have full namespaces. Is that
> to allow extensibility of element attribute on a per-element basis?
>
> I kept "das:type" above because "type" already has too many meanings.
>
>> * the attributes values of 'das:id', 'das:type', and 'das:ptype' are
>> URLs relative to xml:base.
>
> Are all attribute values relative to xml:base or only those three?
>
> Are xlink:href fields relative to xml:base as well? I assume "yes".
>
>> * The FEATURE element may contain zero or more PROPERTIES
>> sub-elements, each with it's own xml:base attribute, effectively
>> changing what xml:base is used within the containted PROP
>> sub-elements.
>>
>> So in this example, the property
>> 'das:ptype="property/genefinder-score"'
>> inherits its xml:base from its grandparent FEATURES element and so
>> expands to:
>>
>> http://www.wormbase.org/das/genome/volvox/1/property/genefinder-score
>>
>> while the 'das:ptype="phase"' and 'das:ptype="protein_translation"'
>> properties inherit xml:base from their PROPERTIES parent element and
>> so expand to:
>>
>> http://www.biodas.org/ns/das/genome/2.00/properties/phase
>> http://www.biodas.org/ns/das/genome/2.00/properties/
>> protein_translation
>
> This is also what happens with the "prop:" namespaced elements, just
> at the element level instead of the attribute level.
>
> To keep this on key/value data I've shifted the rest of the reply
> to the next email.
Andrew
dalke at dalkescientific.com
More information about the DAS2
mailing list