[DAS] Re: Our identifier doc and proposal

Tue, 27 Nov 2001 16:20:19 -0500 (EST)

Yes,

	Absolutely, the question is: do we build the ontology and hope
that it suits 80% of people's needs or do we adopt another group's? I
don't think anyone has formed a genomics ontology group? So I'd be up for
building our own with the help of Thomas/Mathew and Ewan. I think we can
learn from bioperl, biojava, and Ensembl in the way that they build there
feature hierarchies. 

			-B

-----------------------
Brian Gilman <gilmanb@genome.wi.mit.edu>
Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1069 / fax +1 617 252 1902

On Tue, 27 Nov 2001, Lincoln Stein wrote:

> Hi Brian,
> 
> I'm quite sure we'll need an ontology for feature types (at least the
> top few tiers, which people can add to), so we'll be doing some
> ontology building one way or another.  Would you agree?
> 
> Lincoln
> 
> Brian Gilman writes:
>  > I think so and I have also asked about this in the group. it becomes very
>  > hard to "control" the namespace without an ontology. This is why we allow
>  > the individuals to control the top level. 
>  > 
>  > 			-B
>  > 
>  > -----------------------
>  > Brian Gilman <gilmanb@genome.wi.mit.edu>
>  > Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
>  > One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
>  > phone +1 617  252 1069 / fax +1 617 252 1902
>  > 
>  > 
>  > On Tue, 27 Nov 2001, Lincoln Stein wrote:
>  > 
>  > > Hi Brian,
>  > > 
>  > > I'm pleased to see that the I3C identifier is nearly identical to my
>  > > (biological class,namespace,id) triple suggestion.  The difference is
>  > > the version number, which I agree with completely.  So I accept it
>  > > wholeheartedly.
>  > > 
>  > > The part that I don't feel entirely comfortable with is that the
>  > > namespace seems to be completely under the control of the authority:
>  > > 
>  > >    urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
>  > > 
>  > > I think the top level namespace, e.g. "plate" should be hard-and-fast
>  > > data types.  Is this envisioned by the I3C?
>  > > 
>  > > Lincoln
>  > > 
>  > > 
>  > > Brian Gilman writes:
>  > >  > Lincoln,
>  > >  > 
>  > >  > 	Please find attached an updated identifier proposal that we have
>  > >  > been working
>  > >  > on to identifiy objects in the web services architecture. I like it over
>  > >  > the feature_class mechanism becuase we can uniquely identify an object in
>  > >  > the "cloud".
>  > >  > 
>  > >  > 		Best, 
>  > >  > 
>  > >  > 			-Brian
>  > >  > 
>  > >  > -----------------------
>  > >  > Brian Gilman <gilmanb@genome.wi.mit.edu>
>  > >  > Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
>  > >  > One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
>  > >  > phone +1 617  252 1069 / fax +1 617 252 1902
>  > >  > 
>  > >  > 
>  > >  > <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
>  > >  > <html>
>  > >  > <head>
>  > >  >    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
>  > >  >    <meta name="Author" content="Ted liefeld">
>  > >  >    <meta name="GENERATOR" content="Mozilla/4.73 [en]C-CCK-MCD BA45DSL  (WinNT; U) [Netscape]">
>  > >  >    <title>identifiers</title>
>  > >  > </head>
>  > >  > <body>
>  > >  > 
>  > >  > <h2>
>  > >  > I3C Identifier Specification</h2>
>  > >  > 
>  > >  > <h3>
>  > >  > <a NAME="Abstract"></a>Abstract:</h3>
>  > >  > This document describes the motivation for and specification of string
>  > >  > identifiers to be used to identify objects within the life sciences domain
>  > >  > by the I3C architecture.&nbsp; A string format for the identifiers is defined
>  > >  > as&nbsp;<tt>&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version>.</tt>
>  > >  > <br>&nbsp;
>  > >  > <h2>
>  > >  > Index:</h2>
>  > >  > <a href="#Abstract">Abstract</a>
>  > >  > <br><a href="#Introduction:">Introduction</a>
>  > >  > <br><a href="#Background: Existing">Background: Existing Identifiers</a>
>  > >  > <blockquote><a href="#MPI">MPI Id</a>
>  > >  > <br><a href="#AGAVE">AGAVE db_id</a></blockquote>
>  > >  > <a href="#I3C String">I3C String Identifiers</a>
>  > >  > <blockquote><a href="#Requirements">Requirements for the I3C String Identifier</a>
>  > >  > <blockquote><a href="#Syntactic">Syntactic Requirements</a>
>  > >  > <br><a href="#Semantic">Semantic Requirements</a></blockquote>
>  > >  > </blockquote>
>  > >  > <a href="#Specification">Specification of the I3C String Identifier</a>
>  > >  > <blockquote><a href="#Web Centric Id:  URI,">Web Centric Id:&nbsp; URI,
>  > >  > URN</a>
>  > >  > <br><a href="#I3C String Identifier">I3C String Identifier Definition</a>
>  > >  > <br><a href="#Examples">Examples</a></blockquote>
>  > >  > <a href="#Appendix A, URN">Appendix A, URN Reference</a>
>  > >  > <br><a href="#Appendix B, Some example">Appendix B, Some example identifiers</a>
>  > >  > <br><a href="#Appendix C, Additional">Appendix C, Additional Work</a>
>  > >  > <h2>
>  > >  > <a NAME="Introduction:"></a>Introduction:</h2>
>  > >  > One of the goals of the I3c is the definition of a common architecture
>  > >  > and standards to simplify interoperability between applications from different
>  > >  > companies.&nbsp; For interoperability to occur,&nbsp; we need a common
>  > >  > format for unique identifiers for any objects we reference that would function
>  > >  > in the context of I3C services.&nbsp; The remainder of this document defines
>  > >  > a web-centric ID definition that will allow us to create federated systems
>  > >  > utilizing many databases and services in a common way.
>  > >  > <p>The purpose of this identifier definition is to uniquely identify biologically
>  > >  > significant objects,&nbsp; e.g. a sequence, a clone, a gene, a contig etc.
>  > >  > It is not meant to identify artifacts of implementation, e,g, a database,
>  > >  > a server.&nbsp; Identifying objects such as these should be handled via
>  > >  > other mechanisms such as JDBC URLs and WSDL.
>  > >  > <p>In addition, using http URLs as identifiers (e.g. http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-id+7clen1HOYrs+[taxonomy-ID:10090]+-e)
>  > >  > is not adequate for&nbsp; organizations who need to limit or control access
>  > >  > to external databases for intellectual property or security reasons.&nbsp;
>  > >  > The identifiers defined here are deliberately location independent and
>  > >  > are intended to uniquely identify a biological artifact, but not the location
>  > >  > of that artifact.
>  > >  > <br>&nbsp;
>  > >  > <h3>
>  > >  > <a NAME="Background: Existing"></a><b>Background: Existing Identifiers</b></h3>
>  > >  > There are currently many existing forms of identifiers for biological artifacts
>  > >  > in use within the life sciences community. These include proprietary formats
>  > >  > as well as public domain formats.&nbsp; Some of these are discussed below.
>  > >  > <br>&nbsp;
>  > >  > <h4>
>  > >  > <a NAME="MPI"></a>MPI Id</h4>
>  > >  > Within Millennium Pharmaceuticals Inc.,&nbsp; there is a suite of CORBA
>  > >  > services that have been running in production for over two years.&nbsp;
>  > >  > One of the first tasks they addressed was the identity management of objects
>  > >  > that appear in more than one database.&nbsp; To deal with this need, a
>  > >  > CORBA IDL structure called&nbsp; MPI Id was declared.&nbsp; It has since
>  > >  > been reused in many subsequent CORBA services.
>  > >  > <p>The MPI ID corba type is defined as a triple
>  > >  > <br><tt>&nbsp;&nbsp;&nbsp; struct Id {</tt>
>  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string value;</tt>
>  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string domain;</tt>
>  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string type;</tt>
>  > >  > <br><tt>&nbsp;&nbsp;&nbsp; };</tt>
>  > >  > <p>for example, MP PL 001 represents identifier domain 'MP', type 'Plate',
>  > >  > identifier value '001'.&nbsp; The value uniquely identifies an object in
>  > >  > the domain of a given type.&nbsp; Note that an object may have more than
>  > >  > one ID, so that the plate known as MP PL 001 may also be known as SE PL
>  > >  > 435a in the SE domain.
>  > >  > <p>Some of the services that use these identifiers found this too limiting.&nbsp;
>  > >  > For example,&nbsp; retrieving clones from GenBank you may want to use the
>  > >  > accession number or the GI number.&nbsp; However either of these would
>  > >  > have been encoded as
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL gid
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL accession
>  > >  > <br>This raised the problem of differentiating whether an identifier is
>  > >  > a GI number or an accession number.&nbsp; If the namespaces of the accession
>  > >  > number and gi numbers overlap, then there is no way for a server or client
>  > >  > to identify which form was intended.
>  > >  > <p>Another weakness is that object type can be overloaded in the same manner;&nbsp;
>  > >  > for example a sequence and a contig are both sequences.&nbsp; Similarly,
>  > >  > inclusion of a version number for an identifier would require overloading
>  > >  > the value field of the MPI Id.
>  > >  > <p>Therefore it was found that limiting the unique identifier at three
>  > >  > elements was too few.&nbsp; There must be provision for extension.
>  > >  > <br>&nbsp;
>  > >  > <h4>
>  > >  > <a NAME="AGAVE"></a>AGAVE db_id</h4>
>  > >  > Doubltwist Inc. has found some of the same issues in their AGAVE product.&nbsp;
>  > >  > AGAVE defines an identifier called db_id in the AGAVE DTD file, an XML
>  > >  > format.
>  > >  > <p>The AGAVE db_id is defined as follows;
>  > >  > <blockquote><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!-- db_id is an identifier for an object in its source database.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!-- Attributes:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!-- id:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a data identifier
>  > >  > such as GenBank accession or PID.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!-- db_code:&nbsp; a code for the data source, e.g. GenBank
>  > >  > is "gb".&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; --></tt>
>  > >  > <br><tt>&lt;!-- version:&nbsp; version of the associated data.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > --></tt>
>  > >  > <br><tt>&lt;!ELEMENT db_id EMPTY></tt>
>  > >  > <br><tt>&lt;!ATTLIST db_id&nbsp; id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > CDATA&nbsp; #REQUIRED</tt>
>  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > version&nbsp; CDATA&nbsp; #IMPLIED</tt>
>  > >  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > db_code&nbsp; CDATA&nbsp; #REQUIRED ></tt></blockquote>
>  > >  > In this format, the version is explicitly specified, but the weaknesses
>  > >  > remain of having insufficient scope to specify object types or variations
>  > >  > in the type of ID being specified (accession vs gi).
>  > >  > <br>&nbsp;
>  > >  > <h2>
>  > >  > <a NAME="I3C String"></a>I3C String Identifiers</h2>
>  > >  > For use within the I3C architecture, the existing identifier definiitons
>  > >  > was found to be inadequate to handle the breadth and scope of the possible
>  > >  > identifiers that would be required.&nbsp; The following sections detail
>  > >  > the requirements and spaecification of a new identifier format for use
>  > >  > within the I3C architecture.
>  > >  > <h3>
>  > >  > <a NAME="Requirements"></a>Requirements for the I3C String Identifier</h3>
>  > >  > The I3C architecture has the following syntactical and semantic requirements
>  > >  > for its identifiers;
>  > >  > <h4>
>  > >  > <a NAME="Syntactic"></a>Syntactic Requirements</h4>
>  > >  > 
>  > >  > <ol>
>  > >  > <li>
>  > >  > The identifier must be encodable in a string format</li>
>  > >  > 
>  > >  > <li>
>  > >  > The identifier must be extensible</li>
>  > >  > 
>  > >  > <li>
>  > >  > The identifier must uniquely identify one object</li>
>  > >  > 
>  > >  > <li>
>  > >  > The identifier must not require additional contextual information for evaluation</li>
>  > >  > </ol>
>  > >  > These requirements result from the need to transmit the identifier in an
>  > >  > XML format to and from web-services.&nbsp; By requiring that it can be
>  > >  > encoded as a string, it becomes possible to transmit identifiers via other
>  > >  > mechanisms as well.&nbsp; Also, as noted in the examples given above, the
>  > >  > identifier must be extensible to allow use with biological objects that
>  > >  > have not yet been defined.
>  > >  > <h4>
>  > >  > <a NAME="Semantic"></a>Semantic Requirements</h4>
>  > >  > For an Id to uniquely specify a biological object in a system, it needs
>  > >  > to include the following pieces of information;
>  > >  > <br>&nbsp;
>  > >  > <ol>
>  > >  > <li>
>  > >  > &nbsp;Authority :&nbsp; The name of the organization that has defined an
>  > >  > entity.</li>
>  > >  > 
>  > >  > <li>
>  > >  > &nbsp;Id Value : an alpha-numeric sequence that uniquely identifies an
>  > >  > object to its authority</li>
>  > >  > 
>  > >  > <li>
>  > >  > &nbsp;Namespace : one or more statements constraining the scope in which
>  > >  > an Id is evaluated</li>
>  > >  > 
>  > >  > <li>
>  > >  > &nbsp;Version&nbsp; : (optional) version number for an Id</li>
>  > >  > </ol>
>  > >  > As an example, the following uniquely identifies a sequence in Genbank,
>  > >  > <p>&nbsp;&nbsp;&nbsp; GenBank, Sequence, Accession J01636,&nbsp; version
>  > >  > 1
>  > >  > <p>With all these pieces of information we can uniquely identify a sequence.&nbsp;
>  > >  > Leaving off the version number we can get pretty close.&nbsp; Leaving out
>  > >  > any of the other bits of information makes it impossible to find the object
>  > >  > without a priori knowledge of the context.
>  > >  > <br>&nbsp;
>  > >  > <h2>
>  > >  > <a NAME="Specification"></a>Specification of the I3C String Identifier</h2>
>  > >  > To take advantage of existing work on unique identifiers,&nbsp; the I3C
>  > >  > technical Architecture working group has selected the World Wide Web Consortium's
>  > >  > (W3C) definition of a universal resource name (URN) as the basis for the
>  > >  > I3C String Identifier.&nbsp; For additional background on URNs, please
>  > >  > see Appendix A, "URN Reference", for the definiiton of a URN and reference
>  > >  > links.
>  > >  > <h4>
>  > >  > <a NAME="Web Centric Id:  URI,"></a><b>Web Centric Id:&nbsp; URI, URN</b></h4>
>  > >  > To summarize the IETF and W3C documents,&nbsp; a URI can be written as
>  > >  > having the following parts;
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme:namespace identifier://authority/path/.../pathN/value?queryterm#fragment
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > scheme and namespace identifier define the semantics of everything that
>  > >  > follows
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > authority defines the organization responsible for defining and managing
>  > >  > the namespace
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > path/.../pathN/ defines a subset of an authority's namespace
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > value is the last element in the path
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > queryterm indicates a post-processing directive
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > fragment defines a preprocessing directive or fragment within the scope
>  > >  > of the Id
>  > >  > <p>The adoption of the URN format should simplify integration with other
>  > >  > existing standards such as MAGE-ML which permit the use of URN identifiers.
>  > >  > <br>&nbsp;
>  > >  > <h3>
>  > >  > <a NAME="I3C String Identifier"></a>I3C String Identifier Definition</h3>
>  > >  > Given the definition of a URN above, we have defined the following syntax
>  > >  > for an I3C String identifier;
>  > >  > <p><tt>&nbsp;&nbsp;&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version></tt>
>  > >  > <p>The different parts of the identifier are delimited by colons ":".
>  > >  > <p>The elements of the identifier are as follows;
>  > >  > <ul>
>  > >  > <li>
>  > >  > scheme = urn</li>
>  > >  > 
>  > >  > <ul>
>  > >  > <li>
>  > >  > This specifies that the identifier is in URN format</li>
>  > >  > </ul>
>  > >  > 
>  > >  > <li>
>  > >  > namespace identifier = lsid</li>
>  > >  > 
>  > >  > <ul>
>  > >  > <li>
>  > >  > The I3C string identifier namespace identifier is defined as "Life Science
>  > >  > Identifier", or "lsid".</li>
>  > >  > </ul>
>  > >  > 
>  > >  > <li>
>  > >  > authority = &lt;authority></li>
>  > >  > 
>  > >  > <ul>
>  > >  > <li>
>  > >  > This portion uniquely identifies the organization and optionally the organizational
>  > >  > unit that has defined the namespace for the remaining porions of the identifier</li>
>  > >  > </ul>
>  > >  > 
>  > >  > <li>
>  > >  > namespace = &lt;namespace></li>
>  > >  > 
>  > >  > <ul>
>  > >  > <li>
>  > >  > a hierarchical namepace to scope the identifier value.&nbsp; The form and
>  > >  > content of this section is defined and managed by the authority</li>
>  > >  > </ul>
>  > >  > 
>  > >  > <li>
>  > >  > value = &lt;value></li>
>  > >  > 
>  > >  > <ul>
>  > >  > <li>
>  > >  > the unique identifier for an object within the namespace defined by an
>  > >  > authority</li>
>  > >  > </ul>
>  > >  > 
>  > >  > <li>
>  > >  > version = &lt;version></li>
>  > >  > 
>  > >  > <ul>
>  > >  > <li>
>  > >  > optional version information associated with the identifier value</li>
>  > >  > </ul>
>  > >  > </ul>
>  > >  > 
>  > >  > <h4>
>  > >  > <a NAME="Examples"></a>Examples</h4>
>  > >  > So for example, for the plate, identified by millennium as ID 12345 with
>  > >  > MPI ID&nbsp;&nbsp;&nbsp; "MP PL 12345"
>  > >  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate:12345
>  > >  > <p>Since the authority is free to define any path that it wishes (provided
>  > >  > of course that it manages them),&nbsp; we may want to define the path section
>  > >  > for plates more fully to something like this
>  > >  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
>  > >  > <p>We can now use expanded path information to deal with cases that required
>  > >  > type overloading in the MPI ID.&nbsp; For example
>  > >  > <br>&nbsp;&nbsp;&nbsp; (Accession)&nbsp;&nbsp;&nbsp; GB CL j01636 version
>  > >  > 1
>  > >  > <br>&nbsp;&nbsp;&nbsp; (GI)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > >  > GB CL 146575
>  > >  > <br>refer to the same object.&nbsp; These can now be encoded as
>  > >  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/accession:J01636:1
>  > >  > <br>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/gi:146575
>  > >  > <br>&nbsp;
>  > >  > <h3>
>  > >  > <a NAME="Appendix A, URN"></a>Appendix A, URN Reference</h3>
>  > >  > 
>  > >  > <p><br>Ref: http://www.w3.org/Addressing/,http://www.ietf.org/rfc/rfc2141.txt,
>  > >  > http://www.ietf.org/rfc/rfc2396.txt
>  > >  > <p>In the context of the web, there is already a definition for global
>  > >  > identifiers,&nbsp; the Uniform Resource Name.&nbsp; From
>  > >  > <br>http://www.ietf.org/rfc/rfc2141.txt
>  > >  > <blockquote>Uniform Resource Names (URNs) are intended to serve as persistent,
>  > >  > <br>location-independent, resource identifiers and are designed to make
>  > >  > <br>it easy to map other namespaces (which share the properties of URNs)
>  > >  > <br>into URN-space. Therefore, the URN syntax provides a means to encode
>  > >  > <br>character data in a form that can be sent in existing protocols,
>  > >  > <br>transcribed on most keyboards, etc.</blockquote>
>  > >  > URIs are the superset of URNs and URLs.&nbsp; URL's are familiar due to
>  > >  > their use on the web. They differ from URNs in that they are scoped to
>  > >  > a particular protocol (e.g. http:*, ftp:* etc).&nbsp; URN's are scoped
>  > >  > simply as identifiers urn:*.
>  > >  > <p>URNs are divided into two parts,
>  > >  > <br>&nbsp;&nbsp;&nbsp; &lt;scheme> : &lt;scheme specific part >
>  > >  > <br>e.g. http://www.mpi.com/index.html,&nbsp; <b>http</b> is the scheme,&nbsp;
>  > >  > and <b>www.mpi.com/index.html </b>is the scheme specific part that is interpreted
>  > >  > in the context of that scheme.
>  > >  > <br>&nbsp;
>  > >  > <blockquote>The URI syntax does not require that the scheme-specific-part
>  > >  > have&nbsp; any general structure or set of semantics which is common among
>  > >  > all URI.&nbsp; However, a subset of URI do share a common syntax for&nbsp;
>  > >  > representing hierarchical relationships within the namespace.&nbsp; This
>  > >  > "generic URI" syntax consists of a sequence of four main components:
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;scheme>://&lt;authority>&lt;path>?&lt;query>#fragment
>  > >  > <p>each of which, except &lt;scheme>, may be absent from a particular URI.&nbsp;&nbsp;
>  > >  > For example, some URI schemes do not allow an &lt;authority> component,&nbsp;
>  > >  > and others do not use a &lt;query> component.
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp; absoluteURI&nbsp;&nbsp; = scheme ":" ( hier_part
>  > >  > | opaque_part )
>  > >  > <p>&nbsp; URI that are hierarchical in nature use the slash "/" character
>  > >  > for&nbsp; separating hierarchical components.&nbsp; For some file systems,
>  > >  > a "/"&nbsp; character (used to denote the hierarchical structure of a URI)
>  > >  > is the&nbsp; delimiter used to construct a file name hierarchy, and thus
>  > >  > the URI&nbsp; path will look similar to a file pathname.&nbsp; This does
>  > >  > NOT imply that the resource is a file or that the URI maps to an actual
>  > >  > filesystem pathname.
>  > >  > <p>[snip]
>  > >  > <p>The path component contains data, specific to the authority (or the
>  > >  > scheme if there is no authority component), identifying the resource within
>  > >  > the scope of that scheme and authority.
>  > >  > <p>[snip]
>  > >  > <p>When a URI reference is used to perform a retrieval action on the identified
>  > >  > resource, the optional fragment identifier, separated from the URI by a
>  > >  > crosshatch ("#") character, consists of additional reference information
>  > >  > to be interpreted by the user agent after the retrieval action has been
>  > >  > successfully completed.&nbsp; As such, it is not&nbsp; part of a URI, but
>  > >  > is often used in conjunction with a URI.
>  > >  > <p>(http://www.ietf.org/rfc/rfc2396.txt)</blockquote>
>  > >  > So to sum up the IETF stuff,&nbsp; a URI can be written as having all of
>  > >  > the following parts;
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme://authority/path/path2?queryterm=something#fragment
>  > >  > <br>&nbsp;
>  > >  > <br>&nbsp;
>  > >  > <h3>
>  > >  > <a NAME="Appendix B, Some example"></a>Appendix B, Some example identifiers</h3>
>  > >  > Here are some examples of identifiers written in this format;
>  > >  > <p>GenBank:&nbsp; the sequence fo J01636 could be identified as follows;
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636:1
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:K01483
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/gi:146575
>  > >  > <p>The associated protein could be referred to as follows;
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov/protein/locus/AAA24054
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/accession/AAA24054.1
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/pid/g146578
>  > >  > <br>&nbsp;
>  > >  > <p>Another example is the following nucleotide from EMBL
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092
>  > >  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092:1
>  > >  > <p>This includes a reference to a taxonomy term
>  > >  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:taxonomy.ebi.ac.uk::10090
>  > >  > <br>&nbsp;
>  > >  > <br>&nbsp;
>  > >  > <h3>
>  > >  > <a NAME="Appendix C, Additional"></a>Appendix C, Additional Work</h3>
>  > >  > 1. More clearly define what is authority and what is path.&nbsp; E.g. should
>  > >  > GenBank be part of the authority string or is it a part of a path beneath
>  > >  > ncbi.nlm.nih.gov.
>  > >  > <p>2. Since path terms are owned by the authority, get common definitions
>  > >  > for authorities/databases such as GenBank, EMBL etc.&nbsp; This could be
>  > >  > defined by us and presented to the organization in question for ratification.&nbsp;
>  > >  > Entities that do not make IDs publicly available are responsible for themselves
>  > >  > and their customers only but would benefit from a set of guidelines and
>  > >  > examples.
>  > >  > <p>3. Examine use cases in proteomics and other branches of informatics.
>  > >  > <p>4. Create libraries (java, perl) for manipulating IDs in this form.
>  > >  > <br>&nbsp;
>  > >  > <br>&nbsp;
>  > >  > <br>&nbsp;
>  > >  > <br>&nbsp;
>  > >  > </body>
>  > >  > </html>
>  > > 
>  > > -- 
>  > > ========================================================================
>  > > Lincoln D. Stein                           Cold Spring Harbor Laboratory
>  > > lstein@cshl.org			                  Cold Spring Harbor, NY
>  > > 
>  > > NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS. 
>  > > PLEASE WRITE FOR DETAILS.
>  > > ========================================================================
>  > > 
> 
> -- 
> ========================================================================
> Lincoln D. Stein                           Cold Spring Harbor Laboratory
> lstein@cshl.org			                  Cold Spring Harbor, NY
> 
> NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS. 
> PLEASE WRITE FOR DETAILS.
> ========================================================================
>