[DAS2] Spec issues
Chervitz, Steve
Steve_Chervitz at affymetrix.com
Thu Oct 27 01:29:38 UTC 2005
In the spec for DAS/2 retrievals, there are some open issues regarding types
and features that I'd like to solicit feedback on. This is kind of a long
message, so feel free to pick and choose what you want to respond to.
For reference, here's the latest retrieval spec:
http://biodas.org/documents/das2/das2_get.html
Type properties example (only showing relevant attributes):
Description: A set of machine-readable configuration information as
key/value pairs
<TYPES xml:base="http://www.wormbase.org/dase/genome/volvox/1/type">
<TYPE id="curated_gene"
ontology="http://song.sf.net/ontologies/sofa#gene"
source="curated"
xml:base="gene/">
<PROP key="bg:glyph" value="arrow" />
<PROP key="das:editable" value="yes" /?
</TYPE
</TYPES>
The spec currently describes the key attribute as "the name of the property.
Elaborate on how to interpret the name". So how should name be interpreted?
Can it be a URI/URL? If we want it to be just a simple string that can
derive from some controlled vocabulary, how does one specify which
vocabulary it derives from? (e.g.,
http://www.biodas.org/ns/das/properties/2.00)
Also, we might want to allow some controlled vocabulary terms to be used for
the value of type.source (e.g., "das:curated"), to ensure that different
users use the same term to specify that a feature type is produced by
curation.
The spec also seems alarmed by the existence of a xml:base attribute in the
TYPE element. The idea is that any relative URL within this element would be
resolved using that element's xml:base attribute. How would folks be with
having the DAS/2 spec fully support the XML Base spec (
http://www.w3.org/TR/xmlbase/ )? The result of this would be to add an
optional xml:base attribute to all elements that contain URLs or subelements
with URLs.
For an example of how this would work, in the above XML snippet, the
absolute URL for TYPE.id would be
http://www.wormbase.org/dase/genome/volvox/1/type/gene/curated_gene
Next issue: Feature properties example (only showing relevant attributes):
Description: Properties are typed using the ptype attribute. The value of
the property may be indicated by a URL given by the href attribute, or may
be given inline as the CDATA content of the <PROP> section.
<FEATURES xml:base="http://www.wormbase.org/das/genome/volvox/1/">
<FEATURE id="feature/cTel54X.1.2"
type="type/curated_exon">
<PROP ptype="property/genefinder-score">29</PROP>
<PROP ptype="das:phase">2</PROP>
<PROP ptype="property/protein_translation"
href="/das/protein/volvox/2/feature/CTEL54X.1" />
</FEATURE>
</FEATURES>
So in contrast to the TYPE properties which are restricted to being simple
string-based key:value pairs, FEATURE properties can be more complex, which
seems reasonable, given the wild world of features. We might consider using
'key' rather than 'ptype' for FEATURE properties, for consistency with TYPE
prop elements (however, read on).
In the feature filter section, the property-based filter describes feature
properties as being string-based, a la TYPE properties. More complex feature
properties would not necessarily be filterable, so this should be expanded
upon, stating that property-based feature filters will only work for feature
properties that are simple strings (not properties where the value is a URL
or is a CDATA with MIME type not equal to text/plain).
One issue that comes up here, which actually pertains to the spec as a
whole, is that there are various attributes that are intended to be URLs but
are named quite different things. In the FEATURE snippet above, there are
four different attributes that are URLs: id, type, ptype, and href. There is
a bugzilla entry requesting that all attributes named 'id' which are in fact
URLs be named 'uri': http://bugzilla.open-bio.org/show_bug.cgi?id=1788
This seems like a good move to me, since it flags these attributes as
resolvable. Does anyone have objections to this?
For other attributes that are URLs but are not named 'id' or 'href' (such as
type, ptype above), we could either leave as-is, or we could append '_uri'
to their name to flag their resolvability. Feature's PROP.ptype is an
interesting case, since it is both an identifier (equivalent to type
PROP.key) and a URL for describing the property. For this reason, I would
also propose either renaming it 'uri' (to capture this dual role) or 'key'
(for consistency with type properties). Thoughts?
The feature example DASXML above also shows a way to attach a protein
translation to a feature as a property. Since this will be a common task,
I'd vote for having a feature property of "das:property/protein_translation"
among the list of built-in feature properties in the das namespace. Anyone
want to take a stab at defining the full list of built-in properties within
the "das:" and "bg:" namespaces? I think it's a key requirement for
interoperability.
Steve
More information about the DAS2
mailing list