[DAS] comments - hierarchical features
Andy Jenkinson
andy.jenkinson at ebi.ac.uk
Wed Feb 18 16:18:59 UTC 2009
Hi all,
As you may know, soon a new revision of the DAS specification will be
published. One of the features to be added is improved support for
hierarchical features, and I'm looking for input regarding a detail of
how this will be done.
The plan is to replace the <GROUP> structure with something similar to
the DAS/2 approach: parent features have concise <PART> elements that
identify other (separate) child features. Child features have <PARENT>
elements to represent the reciprocal relationship. This means the group
data no longer needs to be duplicated when shared by several features,
and groups can themselves have start/endpoints:
<FEATURE id="A1">
<PART id="B1" />
<PART id="B2" />
... start, end, notes and other verbose content ...
</FEATURE>
<FEATURE id="B1">
<PARENT id="A1" />
... content ...
</FEATURE>
<FEATURE id="B2">
<PARENT id="A1" />
... content ...
</FEATURE>
Here, both contain references to each other representing the same link.
However, it would be possible to represent the relationship even if only
one feature links to the other:
<FEATURE id="A1">
<PART id="B1" />
...
</FEATURE>
<FEATURE id="B1">
...
</FEATURE>
Therefore the option exists to omit the <PARENT> element from the
specification entirely. Over the last couple of years we have seen DAS
sources become more and more dense, and browsers wishing to display
larger regions. As a result, there is significant pressure to minimise
the verbosity of the XML response (there are other changes to the
upcoming spec to help with this). Whilst DAS2's alternative content
negotiation feature sidesteps the issue, DAS does not yet have this and
in any case it is my belief that the fallback XML format should still be
fit for purpose.
The counter argument (i.e. the case for requiring both <PARENT> and
<PART> elements) is based around the rendering efficiency benefits of
streaming. If a client knows for sure that it has parsed all features
that are related to each other, it can render them while it waits for
the server to send the rest of the response. A client could potentially
use this to offer a significant usability boost - a user's perception of
the speed of an interface is greatly influenced by how fast a display
starts to render rather than the time it takes to complete. But at the
moment there are no DAS clients that use this (it is not possible with
the current spec, and some clients such as Ensembl cannot due to the way
the data is rendered). I am not sure to what extent it would be used in
future either, for example it could not be used where post-processing of
the entire set of features is necessary (e.g. binning).
So my question is: should the specification require bi-directional
references (<PARENT> and <PART>), or uni-directional (<PART> only)?
Whichever approach is taken, replacing the <GROUP> structure will
significantly reduce verbosity for groups with large numbers of child
features, but do we want to reduce this further by removing <PARENT>
elements at the cost of the potential for "streaming">
Apologies for the long and technical post.
Andy
More information about the DAS
mailing list