[DAS] comments - hierarchical features

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Thu Feb 19 13:31:23 UTC 2009


Hi Chris,

Thanks for the feedback. I suspect you may be right about the 
optimisation - lots of other pieces have to fall into place for it to 
work. Having subfeatures indicate their containers is indeed how GFF3 
works (and is also how existing DAS works), but in most cases it's a 
fair bit more verbose.

The other aspect I forgot to mention was the DAS-DAS2 transition. The 
parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid 
more divergence when there remains a possibility of uniting them. If we 
don't keep both elements, this isn't so important though.

Speaking personally, I'm not too worried about a lack of obviousness of 
the relationship for using parent/part as I believe it's reasonably 
obvious from the XML, but then again I already know what to expect. So I 
certainly value your perspective if you think it is significantly confusing?

Cheers,
Andy

Chris Mungall wrote:
> 
> I suggest you name relations such that the inverses and directionality 
> are obvious
> 
>     part_of / has_part
>     parent_of / child_of
>     has_parent / has_child
> 
> But not
> 
>     part / parent
> 
> The argument for specifying both seems like premature optimization. I 
> suggest you align what you're doing with GFF3 as far as possible and 
> have subfeatures indicate their containing features.
> 
> On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote:
> 
>> Hi all,
>>
>> As you may know, soon a new revision of the DAS specification will be 
>> published. One of the features to be added is improved support for 
>> hierarchical features, and I'm looking for input regarding a detail of 
>> how this will be done.
>>
>> The plan is to replace the <GROUP> structure with something similar to 
>> the DAS/2 approach: parent features have concise <PART> elements that 
>> identify other (separate) child features. Child features have <PARENT> 
>> elements to represent the reciprocal relationship. This means the 
>> group data no longer needs to be duplicated when shared by several 
>> features, and groups can themselves have start/endpoints:
>>
>>  <FEATURE id="A1">
>>    <PART id="B1" />
>>    <PART id="B2" />
>>    ... start, end, notes and other verbose content ...
>>  </FEATURE>
>>  <FEATURE id="B1">
>>    <PARENT id="A1" />
>>    ... content ...
>>  </FEATURE>
>>  <FEATURE id="B2">
>>    <PARENT id="A1" />
>>    ... content ...
>>  </FEATURE>
>>
>> Here, both contain references to each other representing the same 
>> link. However, it would be possible to represent the relationship even 
>> if only one feature links to the other:
>>
>>  <FEATURE id="A1">
>>    <PART id="B1" />
>>    ...
>>  </FEATURE>
>>  <FEATURE id="B1">
>>    ...
>>  </FEATURE>
>>
>> Therefore the option exists to omit the <PARENT> element from the 
>> specification entirely. Over the last couple of years we have seen DAS 
>> sources become more and more dense, and browsers wishing to display 
>> larger regions. As a result, there is significant pressure to minimise 
>> the verbosity of the XML response (there are other changes to the 
>> upcoming spec to help with this). Whilst DAS2's alternative content 
>> negotiation feature sidesteps the issue, DAS does not yet have this 
>> and in any case it is my belief that the fallback XML format should 
>> still be fit for purpose.
>>
>> The counter argument (i.e. the case for requiring both <PARENT> and 
>> <PART> elements) is based around the rendering efficiency benefits of 
>> streaming. If a client knows for sure that it has parsed all features 
>> that are related to each other, it can render them while it waits for 
>> the server to send the rest of the response. A client could 
>> potentially use this to offer a significant usability boost - a user's 
>> perception of the speed of an interface is greatly influenced by how 
>> fast a display starts to render rather than the time it takes to 
>> complete. But at the moment there are no DAS clients that use this (it 
>> is not possible with the current spec, and some clients such as 
>> Ensembl cannot due to the way the data is rendered). I am not sure to 
>> what extent it would be used in future either, for example it could 
>> not be used where post-processing of the entire set of features is 
>> necessary (e.g. binning).
>>
>> So my question is: should the specification require bi-directional 
>> references (<PARENT> and <PART>), or uni-directional (<PART> only)? 
>> Whichever approach is taken, replacing the <GROUP> structure will 
>> significantly reduce verbosity for groups with large numbers of child 
>> features, but do we want to reduce this further by removing <PARENT> 
>> elements at the cost of the potential for "streaming">
>>
>> Apologies for the long and technical post.
>> Andy
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>>



More information about the DAS mailing list