[DAS] comments - hierarchical features
Gregg Helt
gregghelt at gmail.com
Mon Feb 23 13:29:56 UTC 2009
I also prefer bidirectional links, both for parsing optimizations and for
consistency with DAS/2. As far as naming of the elements, "parent" and
"part" were chosen for DAS/2 after some discussion, but I don't there were
any major justifications for those over a different name pair.
Some may be wondering why not avoid link elements altogether and just
represent feature hierarchies by allowing nested feature elements? So the
A1 parent, B1 & B2 part relationship would look like:
<FEATURE id="A1" ...> ...
<FEATURE id="B1" ...> ... </FEATURE>
<FEATURE id="B2" ...> ... </FEATURE>
</FEATURE>
We considered this during development of DAS/2, but the main use case that
argues against it is when multiple parents share children. For example if
multiple transcripts share exons -- this is how alternative splicing is
modeled in many GMOD databases.
Gregg
On Thu, Feb 19, 2009 at 12:05 PM, Lincoln Stein <lincoln.stein at gmail.com>wrote:
> I am in favor of the bidirectional links, even though it is significantly
> more verbose. I find it convenient to extract a subfeature from the
> datastream and not lose the hierarchy; it is also a handy way to identify
> features that are not part of a larger hierachy.
>
> Although it adds some computational overhead, DAS does compress very nicely
> with standard LZH algorithms, and so stream overhead is not as bad as it
> seems.
>
> Lincoln
>
> On Thu, Feb 19, 2009 at 8:31 AM, Andy Jenkinson <andy.jenkinson at ebi.ac.uk
> >wrote:
>
> > Hi Chris,
> >
> > Thanks for the feedback. I suspect you may be right about the
> optimisation
> > - lots of other pieces have to fall into place for it to work. Having
> > subfeatures indicate their containers is indeed how GFF3 works (and is
> also
> > how existing DAS works), but in most cases it's a fair bit more verbose.
> >
> > The other aspect I forgot to mention was the DAS-DAS2 transition. The
> > parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid
> more
> > divergence when there remains a possibility of uniting them. If we don't
> > keep both elements, this isn't so important though.
> >
> > Speaking personally, I'm not too worried about a lack of obviousness of
> the
> > relationship for using parent/part as I believe it's reasonably obvious
> from
> > the XML, but then again I already know what to expect. So I certainly
> value
> > your perspective if you think it is significantly confusing?
> >
> > Cheers,
> > Andy
> >
> > Chris Mungall wrote:
> >
> >>
> >> I suggest you name relations such that the inverses and directionality
> are
> >> obvious
> >>
> >> part_of / has_part
> >> parent_of / child_of
> >> has_parent / has_child
> >>
> >> But not
> >>
> >> part / parent
> >>
> >> The argument for specifying both seems like premature optimization. I
> >> suggest you align what you're doing with GFF3 as far as possible and
> have
> >> subfeatures indicate their containing features.
> >>
> >>
> >> On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote:
> >>
> >> Hi all,
> >>>
> >>> As you may know, soon a new revision of the DAS specification will be
> >>> published. One of the features to be added is improved support for
> >>> hierarchical features, and I'm looking for input regarding a detail of
> how
> >>> this will be done.
> >>>
> >>> The plan is to replace the <GROUP> structure with something similar to
> >>> the DAS/2 approach: parent features have concise <PART> elements that
> >>> identify other (separate) child features. Child features have <PARENT>
> >>> elements to represent the reciprocal relationship. This means the group
> data
> >>> no longer needs to be duplicated when shared by several features, and
> groups
> >>> can themselves have start/endpoints:
> >>>
> >>> <FEATURE id="A1">
> >>> <PART id="B1" />
> >>> <PART id="B2" />
> >>> ... start, end, notes and other verbose content ...
> >>> </FEATURE>
> >>> <FEATURE id="B1">
> >>> <PARENT id="A1" />
> >>> ... content ...
> >>> </FEATURE>
> >>> <FEATURE id="B2">
> >>> <PARENT id="A1" />
> >>> ... content ...
> >>> </FEATURE>
> >>>
> >>> Here, both contain references to each other representing the same link.
> >>> However, it would be possible to represent the relationship even if
> only one
> >>> feature links to the other:
> >>>
> >>> <FEATURE id="A1">
> >>> <PART id="B1" />
> >>> ...
> >>> </FEATURE>
> >>> <FEATURE id="B1">
> >>> ...
> >>> </FEATURE>
> >>>
> >>> Therefore the option exists to omit the <PARENT> element from the
> >>> specification entirely. Over the last couple of years we have seen DAS
> >>> sources become more and more dense, and browsers wishing to display
> larger
> >>> regions. As a result, there is significant pressure to minimise the
> >>> verbosity of the XML response (there are other changes to the upcoming
> spec
> >>> to help with this). Whilst DAS2's alternative content negotiation
> feature
> >>> sidesteps the issue, DAS does not yet have this and in any case it is
> my
> >>> belief that the fallback XML format should still be fit for purpose.
> >>>
> >>> The counter argument (i.e. the case for requiring both <PARENT> and
> >>> <PART> elements) is based around the rendering efficiency benefits of
> >>> streaming. If a client knows for sure that it has parsed all features
> that
> >>> are related to each other, it can render them while it waits for the
> server
> >>> to send the rest of the response. A client could potentially use this
> to
> >>> offer a significant usability boost - a user's perception of the speed
> of an
> >>> interface is greatly influenced by how fast a display starts to render
> >>> rather than the time it takes to complete. But at the moment there are
> no
> >>> DAS clients that use this (it is not possible with the current spec,
> and
> >>> some clients such as Ensembl cannot due to the way the data is
> rendered). I
> >>> am not sure to what extent it would be used in future either, for
> example it
> >>> could not be used where post-processing of the entire set of features
> is
> >>> necessary (e.g. binning).
> >>>
> >>> So my question is: should the specification require bi-directional
> >>> references (<PARENT> and <PART>), or uni-directional (<PART> only)?
> >>> Whichever approach is taken, replacing the <GROUP> structure will
> >>> significantly reduce verbosity for groups with large numbers of child
> >>> features, but do we want to reduce this further by removing <PARENT>
> >>> elements at the cost of the potential for "streaming">
> >>>
> >>> Apologies for the long and technical post.
> >>> Andy
> >>> _______________________________________________
> >>> DAS mailing list
> >>> DAS at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/das
> >>>
> >>> _______________________________________________
> > DAS mailing list
> > DAS at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/das
> >
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
>
More information about the DAS
mailing list