[DAS2] segments and coordinates

Andrew Dalke dalke at dalkescientific.com
Tue Mar 14 16:09:12 UTC 2006


Summary:  I want to
   - move the COORDINATE element inside of the
         CAPABILITY[type="segments"] element

   - add a 'created' timestamp to the COORDINATE (for sorting by time)

   - add a unique 'uri' identifier attribute to the COORDINATE
      (two coordinates are equal if and only if they have the same id)

   - have that identifier be resolvable, to get information about
       the coordinate system (but perhaps leave the contents for a
       future spec)

In writing the documentation I've been struggling with
COORDINATES.  No surprise there.

The current spec has COORDINATES and the "segments" capability
as different elements, like

<COORDINATES source="Chromosome" authority="NCBI" version="v22"
        taxid="9606" created="2006-03-14T07:27:49" />
<CAPABILITY type="segments"
     query_id="http://localhost/das2/h.sapiens/v22/segments" />

(Note the 'created' timestamp to sort a list of coordinates
by the time it was established.)

With the current discussion on multiple coordinates, it
looks like there is a 1-to-1 relationship between a COORDIANTES
record and a CAPABILITY record.  As that's the case I want
to merge them together, as in (note change from "_id" to "_uri")


<CAPABILITY type="segments"
      query_uri="http://localhost/das2/h.sapiens/v22/segments">
   <COORDINATES source="Chromosome" authority="NCBI" version="v22"
          taxid="9606" created="2006-03-14T07:27:49" />
</CAPABILITY>

In talking with Andreas I think he agrees that this makes sense.


Second, there's a question of identity.  When are two coordinates
the same?  Is it when they have the same
   (authority, source, version)
the same
   (authority, source, version, taxid)

Since taxid is optional, what if one server leaves it out;
are the two still the same?

I decided to solve it with a unique identifier.  Two
COORDINATES are the same if and only if they have the
same identifier.  That identifier just happens to be
a URI.  It does not need to be resolvable (but should
be, with the results viewable at least for humans).

Let's say that
   http://das.sanger.ac.uk/registry/coordinates/ABC123
is the identifier for:
   authority=NCBI
   version=v22
   taxid=9606
   source=Chromosome
   created=2006-03-14T07:27:49

Then the following are equivalent.  The only difference is the
number of properties defined in the COORDINATES tag.

<CAPABILITY type="segments"
      query_uri="http://localhost/das2/h.sapiens/v22/segments">
   <COORDINATES 
uri="http://das.sanger.ac.uk/registry/coordinates/ABC123" />
</CAPABILITY>


<CAPABILITY type="segments"
      query_uri="http://localhost/das2/h.sapiens/v22/segments">
   <COORDINATES uri="http://das.sanger.ac.uk/registry/coordinates/ABC123"
       source="Chromosome"/>
</CAPABILITY>


<CAPABILITY type="segments"
      query_uri="http://localhost/das2/h.sapiens/v22/segments">
   <COORDINATES uri="http://das.sanger.ac.uk/registry/coordinates/ABC123"
      source="Chromosome" authority="NCBI" version="v22" taxid="9606"
      created="2006-03-14T07:27:49" />
</CAPABILITY>


In theory these extra values don't need to be in the COORDINATES
tag.  They are knowable given the uri.  But that requires a
discovery mechanism for the properties (eg, the COORDINATES identifier
might need to be retrievable, with some format or other).

There is the possibility of value mismatch, but as Andreas pointed
out the registry server can do that validation pretty easily.


I mentioned property discovery earlier.  Given a coordinates URI
there are three things you might want to know:
   - what is the full list of coordinate system properties?
   - what is the authoritative reference server for the coordinates?
   - are there alternate reference servers?

What if that was resolvable (doesn't need to be defined for DAS,
so this is hypothetical) into something like

<COORDINATE-SYSTEM doc_href="something for humans to read">
   <!-- definitive information about this coordinate system -->
   <COORDINATES uri="http://das.sanger.ac.uk/registry/coordinates/ABC123"
       source="Chromosome" authority="NCBI" version="v22" taxid="9606"
       created="2006-03-14T07:27:49" />
   <SEGMENT-SERVER uri="http://whatever/" is-authoritative="yes" />
   <SEGMENT-SERVER uri="http://mirror1/"/>
   <SEGMENT-SERVER uri="http://mirror2/"/>
   <SEGMENT-SERVER uri="http://mirror3/"/>
</COORDINATE-SYSTEM>

(Hmmm, those are some ugly names.  I usually shy away from '-'s
in element and attribute names.)


OR, what if the authoritative URL also implemented the segments
interface, and we added a COORDINATES element to it?  Errr, I
don't like that.  We will be in charge of the coordinate
system URIs but we won't be in charge of the primary reference
server.

Use Case #6.

NCBI releases a new human build.  Ensembl releases annotations
for it and wants to put the information with Andreas' registry.

Example of use:
   - Set up an Ensembl reference server and annotation server
        for the new build; test it out
   - Create a new coordinate system record on the registry
      - fill in the species, source, doc_href, etc. fields
      - when finished the result is a URL, tied to coordinate info
   - Stick the COORDINATES information in the versioned
        source record
   - Tell the registry server to register the given versioned
        source URL


					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list