[DAS] Re: Our identifier doc and proposal

Ewan Birney birney@ebi.ac.uk
Wed, 28 Nov 2001 11:47:50 +0000 (GMT)


On Tue, 27 Nov 2001, Brian Gilman wrote:

> Yes,
> 
> 	Absolutely, the question is: do we build the ontology and hope
> that it suits 80% of people's needs or do we adopt another group's? I
> don't think anyone has formed a genomics ontology group? So I'd be up for
> building our own with the help of Thomas/Mathew and Ewan. I think we can
> learn from bioperl, biojava, and Ensembl in the way that they build there
> feature hierarchies. 

<giggle>

We have an ontology? Inside Ensembl?

</giggle>


But - point taken - we actually now have quite an understanding of the
different feature types you would want to display - we'd be happy to
contribute to this. 

It is not really an ontology - it is a heirarchy. I think will piss off
the proffessional ontologists if we called it an ontology (mind
you... maybe that would be fun...)



> 
> 			-B
> 
> -----------------------
> Brian Gilman <gilmanb@genome.wi.mit.edu>
> Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
> One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
> phone +1 617  252 1069 / fax +1 617 252 1902
> 
> 
> On Tue, 27 Nov 2001, Lincoln Stein wrote:
> 
> > Hi Brian,
> > 
> > I'm quite sure we'll need an ontology for feature types (at least the
> > top few tiers, which people can add to), so we'll be doing some
> > ontology building one way or another.  Would you agree?
> > 
> > Lincoln
> > 
> > Brian Gilman writes:
> >  > I think so and I have also asked about this in the group. it becomes very
> >  > hard to "control" the namespace without an ontology. This is why we allow
> >  > the individuals to control the top level. 
> >  > 
> >  > 			-B
> >  > 
> >  > -----------------------
> >  > Brian Gilman <gilmanb@genome.wi.mit.edu>
> >  > Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
> >  > One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
> >  > phone +1 617  252 1069 / fax +1 617 252 1902
> >  > 
> >  > 
> >  > On Tue, 27 Nov 2001, Lincoln Stein wrote:
> >  > 
> >  > > Hi Brian,
> >  > > 
> >  > > I'm pleased to see that the I3C identifier is nearly identical to my
> >  > > (biological class,namespace,id) triple suggestion.  The difference is
> >  > > the version number, which I agree with completely.  So I accept it
> >  > > wholeheartedly.
> >  > > 
> >  > > The part that I don't feel entirely comfortable with is that the
> >  > > namespace seems to be completely under the control of the authority:
> >  > > 
> >  > >    urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
> >  > > 
> >  > > I think the top level namespace, e.g. "plate" should be hard-and-fast
> >  > > data types.  Is this envisioned by the I3C?
> >  > > 
> >  > > Lincoln
> >  > > 
> >  > > 
> >  > > Brian Gilman writes:
> >  > >  > Lincoln,
> >  > >  > 
> >  > >  > 	Please find attached an updated identifier proposal that we have
> >  > >  > been working
> >  > >  > on to identifiy objects in the web services architecture. I like it over
> >  > >  > the feature_class mechanism becuase we can uniquely identify an object in
> >  > >  > the "cloud".
> >  > >  > 
> >  > >  > 		Best, 
> >  > >  > 
> >  > >  > 			-Brian
> >  > >  > 
> >  > >  > -----------------------
> >  > >  > Brian Gilman <gilmanb@genome.wi.mit.edu>
> >  > >  > Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
> >  > >  > One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
> >  > >  > phone +1 617  252 1069 / fax +1 617 252 1902
> >  > >  > 
> >  > >  > 
> >  > >  > <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
> >  > >  > <html>
> >  > >  > <head>
> >  > >  >    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
> >  > >  >    <meta name="Author" content="Ted liefeld">
> >  > >  >    <meta name="GENERATOR" content="Mozilla/4.73 [en]C-CCK-MCD BA45DSL  (WinNT; U) [Netscape]">
> >  > >  >    <title>identifiers</title>
> >  > >  > </head>
> >  > >  > <body>
> >  > >  > 
> >  > >  > <h2>
> >  > >  > I3C Identifier Specification</h2>
> >  > >  > 
> >  > >  > <h3>
> >  > >  > <a NAME="Abstract"></a>Abstract:</h3>
> >  > >  > This document describes the motivation for and specification of string
> >  > >  > identifiers to be used to identify objects within the life sciences domain
> >  > >  > by the I3C architecture.&nbsp; A string format for the identifiers is defined
> >  > >  > as&nbsp;<tt>&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version>.</tt>
> >  > >  > <br>&nbsp;
> >  > >  > <h2>
> >  > >  > Index:</h2>
> >  > >  > <a href="#Abstract">Abstract</a>
> >  > >  > <br><a href="#Introduction:">Introduction</a>
> >  > >  > <br><a href="#Background: Existing">Background: Existing Identifiers</a>
> >  > >  > <blockquote><a href="#MPI">MPI Id</a>
> >  > >  > <br><a href="#AGAVE">AGAVE db_id</a></blockquote>
> >  > >  > <a href="#I3C String">I3C String Identifiers</a>
> >  > >  > <blockquote><a href="#Requirements">Requirements for the I3C String Identifier</a>
> >  > >  > <blockquote><a href="#Syntactic">Syntactic Requirements</a>
> >  > >  > <br><a href="#Semantic">Semantic Requirements</a></blockquote>
> >  > >  > </blockquote>
> >  > >  > <a href="#Specification">Specification of the I3C String Identifier</a>
> >  > >  > <blockquote><a href="#Web Centric Id:  URI,">Web Centric Id:&nbsp; URI,
> >  > >  > URN</a>
> >  > >  > <br><a href="#I3C String Identifier">I3C String Identifier Definition</a>
> >  > >  > <br><a href="#Examples">Examples</a></blockquote>
> >  > >  > <a href="#Appendix A, URN">Appendix A, URN Reference</a>
> >  > >  > <br><a href="#Appendix B, Some example">Appendix B, Some example identifiers</a>
> >  > >  > <br><a href="#Appendix C, Additional">Appendix C, Additional Work</a>
> >  > >  > <h2>
> >  > >  > <a NAME="Introduction:"></a>Introduction:</h2>
> >  > >  > One of the goals of the I3c is the definition of a common architecture
> >  > >  > and standards to simplify interoperability between applications from different
> >  > >  > companies.&nbsp; For interoperability to occur,&nbsp; we need a common
> >  > >  > format for unique identifiers for any objects we reference that would function
> >  > >  > in the context of I3C services.&nbsp; The remainder of this document defines
> >  > >  > a web-centric ID definition that will allow us to create federated systems
> >  > >  > utilizing many databases and services in a common way.
> >  > >  > <p>The purpose of this identifier definition is to uniquely identify biologically
> >  > >  > significant objects,&nbsp; e.g. a sequence, a clone, a gene, a contig etc.
> >  > >  > It is not meant to identify artifacts of implementation, e,g, a database,
> >  > >  > a server.&nbsp; Identifying objects such as these should be handled via
> >  > >  > other mechanisms such as JDBC URLs and WSDL.
> >  > >  > <p>In addition, using http URLs as identifiers (e.g. http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-id+7clen1HOYrs+[taxonomy-ID:10090]+-e)
> >  > >  > is not adequate for&nbsp; organizations who need to limit or control access
> >  > >  > to external databases for intellectual property or security reasons.&nbsp;
> >  > >  > The identifiers defined here are deliberately location independent and
> >  > >  > are intended to uniquely identify a biological artifact, but not the location
> >  > >  > of that artifact.
> >  > >  > <br>&nbsp;
> >  > >  > <h3>
> >  > >  > <a NAME="Background: Existing"></a><b>Background: Existing Identifiers</b></h3>
> >  > >  > There are currently many existing forms of identifiers for biological artifacts
> >  > >  > in use within the life sciences community. These include proprietary formats
> >  > >  > as well as public domain formats.&nbsp; Some of these are discussed below.
> >  > >  > <br>&nbsp;
> >  > >  > <h4>
> >  > >  > <a NAME="MPI"></a>MPI Id</h4>
> >  > >  > Within Millennium Pharmaceuticals Inc.,&nbsp; there is a suite of CORBA
> >  > >  > services that have been running in production for over two years.&nbsp;
> >  > >  > One of the first tasks they addressed was the identity management of objects
> >  > >  > that appear in more than one database.&nbsp; To deal with this need, a
> >  > >  > CORBA IDL structure called&nbsp; MPI Id was declared.&nbsp; It has since
> >  > >  > been reused in many subsequent CORBA services.
> >  > >  > <p>The MPI ID corba type is defined as a triple
> >  > >  > <br><tt>&nbsp;&nbsp;&nbsp; struct Id {</tt>
> >  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string value;</tt>
> >  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string domain;</tt>
> >  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string type;</tt>
> >  > >  > <br><tt>&nbsp;&nbsp;&nbsp; };</tt>
> >  > >  > <p>for example, MP PL 001 represents identifier domain 'MP', type 'Plate',
> >  > >  > identifier value '001'.&nbsp; The value uniquely identifies an object in
> >  > >  > the domain of a given type.&nbsp; Note that an object may have more than
> >  > >  > one ID, so that the plate known as MP PL 001 may also be known as SE PL
> >  > >  > 435a in the SE domain.
> >  > >  > <p>Some of the services that use these identifiers found this too limiting.&nbsp;
> >  > >  > For example,&nbsp; retrieving clones from GenBank you may want to use the
> >  > >  > accession number or the GI number.&nbsp; However either of these would
> >  > >  > have been encoded as
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL gid
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL accession
> >  > >  > <br>This raised the problem of differentiating whether an identifier is
> >  > >  > a GI number or an accession number.&nbsp; If the namespaces of the accession
> >  > >  > number and gi numbers overlap, then there is no way for a server or client
> >  > >  > to identify which form was intended.
> >  > >  > <p>Another weakness is that object type can be overloaded in the same manner;&nbsp;
> >  > >  > for example a sequence and a contig are both sequences.&nbsp; Similarly,
> >  > >  > inclusion of a version number for an identifier would require overloading
> >  > >  > the value field of the MPI Id.
> >  > >  > <p>Therefore it was found that limiting the unique identifier at three
> >  > >  > elements was too few.&nbsp; There must be provision for extension.
> >  > >  > <br>&nbsp;
> >  > >  > <h4>
> >  > >  > <a NAME="AGAVE"></a>AGAVE db_id</h4>
> >  > >  > Doubltwist Inc. has found some of the same issues in their AGAVE product.&nbsp;
> >  > >  > AGAVE defines an identifier called db_id in the AGAVE DTD file, an XML
> >  > >  > format.
> >  > >  > <p>The AGAVE db_id is defined as follows;
> >  > >  > <blockquote><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!-- db_id is an identifier for an object in its source database.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!-- Attributes:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!-- id:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a data identifier
> >  > >  > such as GenBank accession or PID.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!-- db_code:&nbsp; a code for the data source, e.g. GenBank
> >  > >  > is "gb".&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; --></tt>
> >  > >  > <br><tt>&lt;!-- version:&nbsp; version of the associated data.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > --></tt>
> >  > >  > <br><tt>&lt;!ELEMENT db_id EMPTY></tt>
> >  > >  > <br><tt>&lt;!ATTLIST db_id&nbsp; id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > CDATA&nbsp; #REQUIRED</tt>
> >  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > version&nbsp; CDATA&nbsp; #IMPLIED</tt>
> >  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > db_code&nbsp; CDATA&nbsp; #REQUIRED ></tt></blockquote>
> >  > >  > In this format, the version is explicitly specified, but the weaknesses
> >  > >  > remain of having insufficient scope to specify object types or variations
> >  > >  > in the type of ID being specified (accession vs gi).
> >  > >  > <br>&nbsp;
> >  > >  > <h2>
> >  > >  > <a NAME="I3C String"></a>I3C String Identifiers</h2>
> >  > >  > For use within the I3C architecture, the existing identifier definiitons
> >  > >  > was found to be inadequate to handle the breadth and scope of the possible
> >  > >  > identifiers that would be required.&nbsp; The following sections detail
> >  > >  > the requirements and spaecification of a new identifier format for use
> >  > >  > within the I3C architecture.
> >  > >  > <h3>
> >  > >  > <a NAME="Requirements"></a>Requirements for the I3C String Identifier</h3>
> >  > >  > The I3C architecture has the following syntactical and semantic requirements
> >  > >  > for its identifiers;
> >  > >  > <h4>
> >  > >  > <a NAME="Syntactic"></a>Syntactic Requirements</h4>
> >  > >  > 
> >  > >  > <ol>
> >  > >  > <li>
> >  > >  > The identifier must be encodable in a string format</li>
> >  > >  > 
> >  > >  > <li>
> >  > >  > The identifier must be extensible</li>
> >  > >  > 
> >  > >  > <li>
> >  > >  > The identifier must uniquely identify one object</li>
> >  > >  > 
> >  > >  > <li>
> >  > >  > The identifier must not require additional contextual information for evaluation</li>
> >  > >  > </ol>
> >  > >  > These requirements result from the need to transmit the identifier in an
> >  > >  > XML format to and from web-services.&nbsp; By requiring that it can be
> >  > >  > encoded as a string, it becomes possible to transmit identifiers via other
> >  > >  > mechanisms as well.&nbsp; Also, as noted in the examples given above, the
> >  > >  > identifier must be extensible to allow use with biological objects that
> >  > >  > have not yet been defined.
> >  > >  > <h4>
> >  > >  > <a NAME="Semantic"></a>Semantic Requirements</h4>
> >  > >  > For an Id to uniquely specify a biological object in a system, it needs
> >  > >  > to include the following pieces of information;
> >  > >  > <br>&nbsp;
> >  > >  > <ol>
> >  > >  > <li>
> >  > >  > &nbsp;Authority :&nbsp; The name of the organization that has defined an
> >  > >  > entity.</li>
> >  > >  > 
> >  > >  > <li>
> >  > >  > &nbsp;Id Value : an alpha-numeric sequence that uniquely identifies an
> >  > >  > object to its authority</li>
> >  > >  > 
> >  > >  > <li>
> >  > >  > &nbsp;Namespace : one or more statements constraining the scope in which
> >  > >  > an Id is evaluated</li>
> >  > >  > 
> >  > >  > <li>
> >  > >  > &nbsp;Version&nbsp; : (optional) version number for an Id</li>
> >  > >  > </ol>
> >  > >  > As an example, the following uniquely identifies a sequence in Genbank,
> >  > >  > <p>&nbsp;&nbsp;&nbsp; GenBank, Sequence, Accession J01636,&nbsp; version
> >  > >  > 1
> >  > >  > <p>With all these pieces of information we can uniquely identify a sequence.&nbsp;
> >  > >  > Leaving off the version number we can get pretty close.&nbsp; Leaving out
> >  > >  > any of the other bits of information makes it impossible to find the object
> >  > >  > without a priori knowledge of the context.
> >  > >  > <br>&nbsp;
> >  > >  > <h2>
> >  > >  > <a NAME="Specification"></a>Specification of the I3C String Identifier</h2>
> >  > >  > To take advantage of existing work on unique identifiers,&nbsp; the I3C
> >  > >  > technical Architecture working group has selected the World Wide Web Consortium's
> >  > >  > (W3C) definition of a universal resource name (URN) as the basis for the
> >  > >  > I3C String Identifier.&nbsp; For additional background on URNs, please
> >  > >  > see Appendix A, "URN Reference", for the definiiton of a URN and reference
> >  > >  > links.
> >  > >  > <h4>
> >  > >  > <a NAME="Web Centric Id:  URI,"></a><b>Web Centric Id:&nbsp; URI, URN</b></h4>
> >  > >  > To summarize the IETF and W3C documents,&nbsp; a URI can be written as
> >  > >  > having the following parts;
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme:namespace identifier://authority/path/.../pathN/value?queryterm#fragment
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > scheme and namespace identifier define the semantics of everything that
> >  > >  > follows
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > authority defines the organization responsible for defining and managing
> >  > >  > the namespace
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > path/.../pathN/ defines a subset of an authority's namespace
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > value is the last element in the path
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > queryterm indicates a post-processing directive
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > fragment defines a preprocessing directive or fragment within the scope
> >  > >  > of the Id
> >  > >  > <p>The adoption of the URN format should simplify integration with other
> >  > >  > existing standards such as MAGE-ML which permit the use of URN identifiers.
> >  > >  > <br>&nbsp;
> >  > >  > <h3>
> >  > >  > <a NAME="I3C String Identifier"></a>I3C String Identifier Definition</h3>
> >  > >  > Given the definition of a URN above, we have defined the following syntax
> >  > >  > for an I3C String identifier;
> >  > >  > <p><tt>&nbsp;&nbsp;&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version></tt>
> >  > >  > <p>The different parts of the identifier are delimited by colons ":".
> >  > >  > <p>The elements of the identifier are as follows;
> >  > >  > <ul>
> >  > >  > <li>
> >  > >  > scheme = urn</li>
> >  > >  > 
> >  > >  > <ul>
> >  > >  > <li>
> >  > >  > This specifies that the identifier is in URN format</li>
> >  > >  > </ul>
> >  > >  > 
> >  > >  > <li>
> >  > >  > namespace identifier = lsid</li>
> >  > >  > 
> >  > >  > <ul>
> >  > >  > <li>
> >  > >  > The I3C string identifier namespace identifier is defined as "Life Science
> >  > >  > Identifier", or "lsid".</li>
> >  > >  > </ul>
> >  > >  > 
> >  > >  > <li>
> >  > >  > authority = &lt;authority></li>
> >  > >  > 
> >  > >  > <ul>
> >  > >  > <li>
> >  > >  > This portion uniquely identifies the organization and optionally the organizational
> >  > >  > unit that has defined the namespace for the remaining porions of the identifier</li>
> >  > >  > </ul>
> >  > >  > 
> >  > >  > <li>
> >  > >  > namespace = &lt;namespace></li>
> >  > >  > 
> >  > >  > <ul>
> >  > >  > <li>
> >  > >  > a hierarchical namepace to scope the identifier value.&nbsp; The form and
> >  > >  > content of this section is defined and managed by the authority</li>
> >  > >  > </ul>
> >  > >  > 
> >  > >  > <li>
> >  > >  > value = &lt;value></li>
> >  > >  > 
> >  > >  > <ul>
> >  > >  > <li>
> >  > >  > the unique identifier for an object within the namespace defined by an
> >  > >  > authority</li>
> >  > >  > </ul>
> >  > >  > 
> >  > >  > <li>
> >  > >  > version = &lt;version></li>
> >  > >  > 
> >  > >  > <ul>
> >  > >  > <li>
> >  > >  > optional version information associated with the identifier value</li>
> >  > >  > </ul>
> >  > >  > </ul>
> >  > >  > 
> >  > >  > <h4>
> >  > >  > <a NAME="Examples"></a>Examples</h4>
> >  > >  > So for example, for the plate, identified by millennium as ID 12345 with
> >  > >  > MPI ID&nbsp;&nbsp;&nbsp; "MP PL 12345"
> >  > >  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate:12345
> >  > >  > <p>Since the authority is free to define any path that it wishes (provided
> >  > >  > of course that it manages them),&nbsp; we may want to define the path section
> >  > >  > for plates more fully to something like this
> >  > >  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
> >  > >  > <p>We can now use expanded path information to deal with cases that required
> >  > >  > type overloading in the MPI ID.&nbsp; For example
> >  > >  > <br>&nbsp;&nbsp;&nbsp; (Accession)&nbsp;&nbsp;&nbsp; GB CL j01636 version
> >  > >  > 1
> >  > >  > <br>&nbsp;&nbsp;&nbsp; (GI)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> >  > >  > GB CL 146575
> >  > >  > <br>refer to the same object.&nbsp; These can now be encoded as
> >  > >  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/accession:J01636:1
> >  > >  > <br>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/gi:146575
> >  > >  > <br>&nbsp;
> >  > >  > <h3>
> >  > >  > <a NAME="Appendix A, URN"></a>Appendix A, URN Reference</h3>
> >  > >  > 
> >  > >  > <p><br>Ref: http://www.w3.org/Addressing/,http://www.ietf.org/rfc/rfc2141.txt,
> >  > >  > http://www.ietf.org/rfc/rfc2396.txt
> >  > >  > <p>In the context of the web, there is already a definition for global
> >  > >  > identifiers,&nbsp; the Uniform Resource Name.&nbsp; From
> >  > >  > <br>http://www.ietf.org/rfc/rfc2141.txt
> >  > >  > <blockquote>Uniform Resource Names (URNs) are intended to serve as persistent,
> >  > >  > <br>location-independent, resource identifiers and are designed to make
> >  > >  > <br>it easy to map other namespaces (which share the properties of URNs)
> >  > >  > <br>into URN-space. Therefore, the URN syntax provides a means to encode
> >  > >  > <br>character data in a form that can be sent in existing protocols,
> >  > >  > <br>transcribed on most keyboards, etc.</blockquote>
> >  > >  > URIs are the superset of URNs and URLs.&nbsp; URL's are familiar due to
> >  > >  > their use on the web. They differ from URNs in that they are scoped to
> >  > >  > a particular protocol (e.g. http:*, ftp:* etc).&nbsp; URN's are scoped
> >  > >  > simply as identifiers urn:*.
> >  > >  > <p>URNs are divided into two parts,
> >  > >  > <br>&nbsp;&nbsp;&nbsp; &lt;scheme> : &lt;scheme specific part >
> >  > >  > <br>e.g. http://www.mpi.com/index.html,&nbsp; <b>http</b> is the scheme,&nbsp;
> >  > >  > and <b>www.mpi.com/index.html </b>is the scheme specific part that is interpreted
> >  > >  > in the context of that scheme.
> >  > >  > <br>&nbsp;
> >  > >  > <blockquote>The URI syntax does not require that the scheme-specific-part
> >  > >  > have&nbsp; any general structure or set of semantics which is common among
> >  > >  > all URI.&nbsp; However, a subset of URI do share a common syntax for&nbsp;
> >  > >  > representing hierarchical relationships within the namespace.&nbsp; This
> >  > >  > "generic URI" syntax consists of a sequence of four main components:
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;scheme>://&lt;authority>&lt;path>?&lt;query>#fragment
> >  > >  > <p>each of which, except &lt;scheme>, may be absent from a particular URI.&nbsp;&nbsp;
> >  > >  > For example, some URI schemes do not allow an &lt;authority> component,&nbsp;
> >  > >  > and others do not use a &lt;query> component.
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp; absoluteURI&nbsp;&nbsp; = scheme ":" ( hier_part
> >  > >  > | opaque_part )
> >  > >  > <p>&nbsp; URI that are hierarchical in nature use the slash "/" character
> >  > >  > for&nbsp; separating hierarchical components.&nbsp; For some file systems,
> >  > >  > a "/"&nbsp; character (used to denote the hierarchical structure of a URI)
> >  > >  > is the&nbsp; delimiter used to construct a file name hierarchy, and thus
> >  > >  > the URI&nbsp; path will look similar to a file pathname.&nbsp; This does
> >  > >  > NOT imply that the resource is a file or that the URI maps to an actual
> >  > >  > filesystem pathname.
> >  > >  > <p>[snip]
> >  > >  > <p>The path component contains data, specific to the authority (or the
> >  > >  > scheme if there is no authority component), identifying the resource within
> >  > >  > the scope of that scheme and authority.
> >  > >  > <p>[snip]
> >  > >  > <p>When a URI reference is used to perform a retrieval action on the identified
> >  > >  > resource, the optional fragment identifier, separated from the URI by a
> >  > >  > crosshatch ("#") character, consists of additional reference information
> >  > >  > to be interpreted by the user agent after the retrieval action has been
> >  > >  > successfully completed.&nbsp; As such, it is not&nbsp; part of a URI, but
> >  > >  > is often used in conjunction with a URI.
> >  > >  > <p>(http://www.ietf.org/rfc/rfc2396.txt)</blockquote>
> >  > >  > So to sum up the IETF stuff,&nbsp; a URI can be written as having all of
> >  > >  > the following parts;
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme://authority/path/path2?queryterm=something#fragment
> >  > >  > <br>&nbsp;
> >  > >  > <br>&nbsp;
> >  > >  > <h3>
> >  > >  > <a NAME="Appendix B, Some example"></a>Appendix B, Some example identifiers</h3>
> >  > >  > Here are some examples of identifiers written in this format;
> >  > >  > <p>GenBank:&nbsp; the sequence fo J01636 could be identified as follows;
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636:1
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:K01483
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/gi:146575
> >  > >  > <p>The associated protein could be referred to as follows;
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov/protein/locus/AAA24054
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/accession/AAA24054.1
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/pid/g146578
> >  > >  > <br>&nbsp;
> >  > >  > <p>Another example is the following nucleotide from EMBL
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092
> >  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092:1
> >  > >  > <p>This includes a reference to a taxonomy term
> >  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:taxonomy.ebi.ac.uk::10090
> >  > >  > <br>&nbsp;
> >  > >  > <br>&nbsp;
> >  > >  > <h3>
> >  > >  > <a NAME="Appendix C, Additional"></a>Appendix C, Additional Work</h3>
> >  > >  > 1. More clearly define what is authority and what is path.&nbsp; E.g. should
> >  > >  > GenBank be part of the authority string or is it a part of a path beneath
> >  > >  > ncbi.nlm.nih.gov.
> >  > >  > <p>2. Since path terms are owned by the authority, get common definitions
> >  > >  > for authorities/databases such as GenBank, EMBL etc.&nbsp; This could be
> >  > >  > defined by us and presented to the organization in question for ratification.&nbsp;
> >  > >  > Entities that do not make IDs publicly available are responsible for themselves
> >  > >  > and their customers only but would benefit from a set of guidelines and
> >  > >  > examples.
> >  > >  > <p>3. Examine use cases in proteomics and other branches of informatics.
> >  > >  > <p>4. Create libraries (java, perl) for manipulating IDs in this form.
> >  > >  > <br>&nbsp;
> >  > >  > <br>&nbsp;
> >  > >  > <br>&nbsp;
> >  > >  > <br>&nbsp;
> >  > >  > </body>
> >  > >  > </html>
> >  > > 
> >  > > -- 
> >  > > ========================================================================
> >  > > Lincoln D. Stein                           Cold Spring Harbor Laboratory
> >  > > lstein@cshl.org			                  Cold Spring Harbor, NY
> >  > > 
> >  > > NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS. 
> >  > > PLEASE WRITE FOR DETAILS.
> >  > > ========================================================================
> >  > > 
> > 
> > -- 
> > ========================================================================
> > Lincoln D. Stein                           Cold Spring Harbor Laboratory
> > lstein@cshl.org			                  Cold Spring Harbor, NY
> > 
> > NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS. 
> > PLEASE WRITE FOR DETAILS.
> > ========================================================================
> > 
> 
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------