[DAS] comments - hierarchical features

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Wed Feb 18 16:18:59 UTC 2009


Hi all,

As you may know, soon a new revision of the DAS specification will be 
published. One of the features to be added is improved support for 
hierarchical features, and I'm looking for input regarding a detail of 
how this will be done.

The plan is to replace the <GROUP> structure with something similar to 
the DAS/2 approach: parent features have concise <PART> elements that 
identify other (separate) child features. Child features have <PARENT> 
elements to represent the reciprocal relationship. This means the group 
data no longer needs to be duplicated when shared by several features, 
and groups can themselves have start/endpoints:

   <FEATURE id="A1">
     <PART id="B1" />
     <PART id="B2" />
     ... start, end, notes and other verbose content ...
   </FEATURE>
   <FEATURE id="B1">
     <PARENT id="A1" />
     ... content ...
   </FEATURE>
   <FEATURE id="B2">
     <PARENT id="A1" />
     ... content ...
   </FEATURE>

Here, both contain references to each other representing the same link. 
However, it would be possible to represent the relationship even if only 
one feature links to the other:

   <FEATURE id="A1">
     <PART id="B1" />
     ...
   </FEATURE>
   <FEATURE id="B1">
     ...
   </FEATURE>

Therefore the option exists to omit the <PARENT> element from the 
specification entirely. Over the last couple of years we have seen DAS 
sources become more and more dense, and browsers wishing to display 
larger regions. As a result, there is significant pressure to minimise 
the verbosity of the XML response (there are other changes to the 
upcoming spec to help with this). Whilst DAS2's alternative content 
negotiation feature sidesteps the issue, DAS does not yet have this and 
in any case it is my belief that the fallback XML format should still be 
fit for purpose.

The counter argument (i.e. the case for requiring both <PARENT> and 
<PART> elements) is based around the rendering efficiency benefits of 
streaming. If a client knows for sure that it has parsed all features 
that are related to each other, it can render them while it waits for 
the server to send the rest of the response. A client could potentially 
use this to offer a significant usability boost - a user's perception of 
the speed of an interface is greatly influenced by how fast a display 
starts to render rather than the time it takes to complete. But at the 
moment there are no DAS clients that use this (it is not possible with 
the current spec, and some clients such as Ensembl cannot due to the way 
the data is rendered). I am not sure to what extent it would be used in 
future either, for example it could not be used where post-processing of 
the entire set of features is necessary (e.g. binning).

So my question is: should the specification require bi-directional 
references (<PARENT> and <PART>), or uni-directional (<PART> only)? 
Whichever approach is taken, replacing the <GROUP> structure will 
significantly reduce verbosity for groups with large numbers of child 
features, but do we want to reduce this further by removing <PARENT> 
elements at the cost of the potential for "streaming">

Apologies for the long and technical post.
Andy



More information about the DAS mailing list