[DAS] comments - hierarchical features
Lincoln Stein
lincoln.stein at gmail.com
Thu Feb 19 17:05:28 UTC 2009
I am in favor of the bidirectional links, even though it is significantly
more verbose. I find it convenient to extract a subfeature from the
datastream and not lose the hierarchy; it is also a handy way to identify
features that are not part of a larger hierachy.
Although it adds some computational overhead, DAS does compress very nicely
with standard LZH algorithms, and so stream overhead is not as bad as it
seems.
Lincoln
On Thu, Feb 19, 2009 at 8:31 AM, Andy Jenkinson <andy.jenkinson at ebi.ac.uk>wrote:
> Hi Chris,
>
> Thanks for the feedback. I suspect you may be right about the optimisation
> - lots of other pieces have to fall into place for it to work. Having
> subfeatures indicate their containers is indeed how GFF3 works (and is also
> how existing DAS works), but in most cases it's a fair bit more verbose.
>
> The other aspect I forgot to mention was the DAS-DAS2 transition. The
> parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid more
> divergence when there remains a possibility of uniting them. If we don't
> keep both elements, this isn't so important though.
>
> Speaking personally, I'm not too worried about a lack of obviousness of the
> relationship for using parent/part as I believe it's reasonably obvious from
> the XML, but then again I already know what to expect. So I certainly value
> your perspective if you think it is significantly confusing?
>
> Cheers,
> Andy
>
> Chris Mungall wrote:
>
>>
>> I suggest you name relations such that the inverses and directionality are
>> obvious
>>
>> part_of / has_part
>> parent_of / child_of
>> has_parent / has_child
>>
>> But not
>>
>> part / parent
>>
>> The argument for specifying both seems like premature optimization. I
>> suggest you align what you're doing with GFF3 as far as possible and have
>> subfeatures indicate their containing features.
>>
>>
>> On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote:
>>
>> Hi all,
>>>
>>> As you may know, soon a new revision of the DAS specification will be
>>> published. One of the features to be added is improved support for
>>> hierarchical features, and I'm looking for input regarding a detail of how
>>> this will be done.
>>>
>>> The plan is to replace the <GROUP> structure with something similar to
>>> the DAS/2 approach: parent features have concise <PART> elements that
>>> identify other (separate) child features. Child features have <PARENT>
>>> elements to represent the reciprocal relationship. This means the group data
>>> no longer needs to be duplicated when shared by several features, and groups
>>> can themselves have start/endpoints:
>>>
>>> <FEATURE id="A1">
>>> <PART id="B1" />
>>> <PART id="B2" />
>>> ... start, end, notes and other verbose content ...
>>> </FEATURE>
>>> <FEATURE id="B1">
>>> <PARENT id="A1" />
>>> ... content ...
>>> </FEATURE>
>>> <FEATURE id="B2">
>>> <PARENT id="A1" />
>>> ... content ...
>>> </FEATURE>
>>>
>>> Here, both contain references to each other representing the same link.
>>> However, it would be possible to represent the relationship even if only one
>>> feature links to the other:
>>>
>>> <FEATURE id="A1">
>>> <PART id="B1" />
>>> ...
>>> </FEATURE>
>>> <FEATURE id="B1">
>>> ...
>>> </FEATURE>
>>>
>>> Therefore the option exists to omit the <PARENT> element from the
>>> specification entirely. Over the last couple of years we have seen DAS
>>> sources become more and more dense, and browsers wishing to display larger
>>> regions. As a result, there is significant pressure to minimise the
>>> verbosity of the XML response (there are other changes to the upcoming spec
>>> to help with this). Whilst DAS2's alternative content negotiation feature
>>> sidesteps the issue, DAS does not yet have this and in any case it is my
>>> belief that the fallback XML format should still be fit for purpose.
>>>
>>> The counter argument (i.e. the case for requiring both <PARENT> and
>>> <PART> elements) is based around the rendering efficiency benefits of
>>> streaming. If a client knows for sure that it has parsed all features that
>>> are related to each other, it can render them while it waits for the server
>>> to send the rest of the response. A client could potentially use this to
>>> offer a significant usability boost - a user's perception of the speed of an
>>> interface is greatly influenced by how fast a display starts to render
>>> rather than the time it takes to complete. But at the moment there are no
>>> DAS clients that use this (it is not possible with the current spec, and
>>> some clients such as Ensembl cannot due to the way the data is rendered). I
>>> am not sure to what extent it would be used in future either, for example it
>>> could not be used where post-processing of the entire set of features is
>>> necessary (e.g. binning).
>>>
>>> So my question is: should the specification require bi-directional
>>> references (<PARENT> and <PART>), or uni-directional (<PART> only)?
>>> Whichever approach is taken, replacing the <GROUP> structure will
>>> significantly reduce verbosity for groups with large numbers of child
>>> features, but do we want to reduce this further by removing <PARENT>
>>> elements at the cost of the potential for "streaming">
>>>
>>> Apologies for the long and technical post.
>>> Andy
>>> _______________________________________________
>>> DAS mailing list
>>> DAS at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das
>>>
>>> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
>
--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
More information about the DAS
mailing list