[DAS] Re: Our identifier doc and proposal

Thu, 29 Nov 2001 17:18:43 +0100 (CET)

Quoting Ewan Birney <birney@ebi.ac.uk>:
>  On Wed, 28 Nov 2001, Lincoln Stein wrote:
>  > I think we're going to find that the features form a DAG and not a
>  > hierarchy.  [...]
> [...] 
> You are right. I'm glad you are going to sort out how to have an
> extensible distributed DAG system that is easy to use. ;)

I'm a little confused as to what exactly is going to be a DAG and not
a treelike hierarchy. I can see three interpretations:

  1. Types used to describe features are arranged in a DAG
  2. Features used to annotate sequences form a DAG
  3. The contents of the XML payload form a DAG

That is:

1. Types used to describe features are arranged in a DAG

Every feature associates some information with a subsequence of the
reference sequence, and it is necessary to know what kind of information
is represented, e.g. for display (glyph choice), retrieval and searching,
assembly, etc. It is useful to arrange the types in a tree or a DAG in
order to share properties: for example, if transcripts are blue, then
mRNA features are, too. A feature type might be a subcase of, and share
properties with, more than one other type. So feature types are a DAG.

Well, they are probably a lattice and precisely a Scott domain, where
Top would have the properties of every feature type and Bottom would be
the unknown type. (No idle threats about category theory from me).

I tentatively suspect that this is what Lincoln has in mind.

2. Features used to annotate sequences form a DAG

Instead of anchoring features to reference sequences, they could be anchored
to other features, thereby inducing a hierarchy of features that eventually
works its way back to the reference sequence. Since it might happen that a
feature is sensibly anchored to more than one, the set of features forms a
DAG---I guess this would be helpful for example if reannotation moves a 
feature, having more than one anchor makes it possible to check consistency,
or even to propagate changes.

Since it would be dangerous to anchor features to annotations that might
just disappear, probably this feature-anchored features should be limited
to the same annotation track as their anchor, i.e. kept on the same server,
since otherwise we'd introduce a troubling notion of "reference annotation
server". Which is what I suspect Ewan means when he talks about a distributed
DAG system. Which might be a truly excellent idea, but a big nest of snakes.

3. The contents of the XML payload form a DAG

Syntactically, the XML file cannot be a DAG, as containers must nest
properly. Putting a unique identifier in shared element nodes is enough,
though, to make it possible for more than one parent node to share them.
But somehow I don't think this is what the debate is about. Still, those
mapping an document object model to real objects on a one-to-one basis
might have something to say about this.

These interpretations are not mutually exclusive, but the language is
sufficiently similar that I for one can't easily tell them apart. Could
those discussing the subject kindly explain what they mean?

My interest is more than academic, since we are seriously studying how to
use DAS for genome-genome annotations, where each (partial) genome could
be used as a reference sequence for the set annotations comprised by other
genomes. Both kinds of DAG (1 & 2) would have nontrivial consequences for
the servers and clients we could build.

djs               David J. Sherman          (David.Sherman@LaBRI.FR)
                  Laboratoire Bordelais de Recherche en Informatique
                  voix : +33 5 56 84 6922     fax : +33 5 56 84 6669