[DAS] Our identifier doc and proposal

Tue, 27 Nov 2001 15:49:13 -0500

Hi Brian,

I'm pleased to see that the I3C identifier is nearly identical to my
(biological class,namespace,id) triple suggestion.  The difference is
the version number, which I agree with completely.  So I accept it
wholeheartedly.

The part that I don't feel entirely comfortable with is that the
namespace seems to be completely under the control of the authority:

   urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345

I think the top level namespace, e.g. "plate" should be hard-and-fast
data types.  Is this envisioned by the I3C?

Lincoln

Brian Gilman writes:
 > Lincoln,
 > 
 > 	Please find attached an updated identifier proposal that we have
 > been working
 > on to identifiy objects in the web services architecture. I like it over
 > the feature_class mechanism becuase we can uniquely identify an object in
 > the "cloud".
 > 
 > 		Best, 
 > 
 > 			-Brian
 > 
 > -----------------------
 > Brian Gilman <gilmanb@genome.wi.mit.edu>
 > Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
 > One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
 > phone +1 617  252 1069 / fax +1 617 252 1902
 > 
 > 
 > <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
 > <html>
 > <head>
 >    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 >    <meta name="Author" content="Ted liefeld">
 >    <meta name="GENERATOR" content="Mozilla/4.73 [en]C-CCK-MCD BA45DSL  (WinNT; U) [Netscape]">
 >    <title>identifiers</title>
 > </head>
 > <body>
 > 
 > <h2>
 > I3C Identifier Specification</h2>
 > 
 > <h3>
 > <a NAME="Abstract"></a>Abstract:</h3>
 > This document describes the motivation for and specification of string
 > identifiers to be used to identify objects within the life sciences domain
 > by the I3C architecture.&nbsp; A string format for the identifiers is defined
 > as&nbsp;<tt>&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version>.</tt>
 > <br>&nbsp;
 > <h2>
 > Index:</h2>
 > <a href="#Abstract">Abstract</a>
 > <br><a href="#Introduction:">Introduction</a>
 > <br><a href="#Background: Existing">Background: Existing Identifiers</a>
 > <blockquote><a href="#MPI">MPI Id</a>
 > <br><a href="#AGAVE">AGAVE db_id</a></blockquote>
 > <a href="#I3C String">I3C String Identifiers</a>
 > <blockquote><a href="#Requirements">Requirements for the I3C String Identifier</a>
 > <blockquote><a href="#Syntactic">Syntactic Requirements</a>
 > <br><a href="#Semantic">Semantic Requirements</a></blockquote>
 > </blockquote>
 > <a href="#Specification">Specification of the I3C String Identifier</a>
 > <blockquote><a href="#Web Centric Id:  URI,">Web Centric Id:&nbsp; URI,
 > URN</a>
 > <br><a href="#I3C String Identifier">I3C String Identifier Definition</a>
 > <br><a href="#Examples">Examples</a></blockquote>
 > <a href="#Appendix A, URN">Appendix A, URN Reference</a>
 > <br><a href="#Appendix B, Some example">Appendix B, Some example identifiers</a>
 > <br><a href="#Appendix C, Additional">Appendix C, Additional Work</a>
 > <h2>
 > <a NAME="Introduction:"></a>Introduction:</h2>
 > One of the goals of the I3c is the definition of a common architecture
 > and standards to simplify interoperability between applications from different
 > companies.&nbsp; For interoperability to occur,&nbsp; we need a common
 > format for unique identifiers for any objects we reference that would function
 > in the context of I3C services.&nbsp; The remainder of this document defines
 > a web-centric ID definition that will allow us to create federated systems
 > utilizing many databases and services in a common way.
 > <p>The purpose of this identifier definition is to uniquely identify biologically
 > significant objects,&nbsp; e.g. a sequence, a clone, a gene, a contig etc.
 > It is not meant to identify artifacts of implementation, e,g, a database,
 > a server.&nbsp; Identifying objects such as these should be handled via
 > other mechanisms such as JDBC URLs and WSDL.
 > <p>In addition, using http URLs as identifiers (e.g. http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-id+7clen1HOYrs+[taxonomy-ID:10090]+-e)
 > is not adequate for&nbsp; organizations who need to limit or control access
 > to external databases for intellectual property or security reasons.&nbsp;
 > The identifiers defined here are deliberately location independent and
 > are intended to uniquely identify a biological artifact, but not the location
 > of that artifact.
 > <br>&nbsp;
 > <h3>
 > <a NAME="Background: Existing"></a><b>Background: Existing Identifiers</b></h3>
 > There are currently many existing forms of identifiers for biological artifacts
 > in use within the life sciences community. These include proprietary formats
 > as well as public domain formats.&nbsp; Some of these are discussed below.
 > <br>&nbsp;
 > <h4>
 > <a NAME="MPI"></a>MPI Id</h4>
 > Within Millennium Pharmaceuticals Inc.,&nbsp; there is a suite of CORBA
 > services that have been running in production for over two years.&nbsp;
 > One of the first tasks they addressed was the identity management of objects
 > that appear in more than one database.&nbsp; To deal with this need, a
 > CORBA IDL structure called&nbsp; MPI Id was declared.&nbsp; It has since
 > been reused in many subsequent CORBA services.
 > <p>The MPI ID corba type is defined as a triple
 > <br><tt>&nbsp;&nbsp;&nbsp; struct Id {</tt>
 > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string value;</tt>
 > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string domain;</tt>
 > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string type;</tt>
 > <br><tt>&nbsp;&nbsp;&nbsp; };</tt>
 > <p>for example, MP PL 001 represents identifier domain 'MP', type 'Plate',
 > identifier value '001'.&nbsp; The value uniquely identifies an object in
 > the domain of a given type.&nbsp; Note that an object may have more than
 > one ID, so that the plate known as MP PL 001 may also be known as SE PL
 > 435a in the SE domain.
 > <p>Some of the services that use these identifiers found this too limiting.&nbsp;
 > For example,&nbsp; retrieving clones from GenBank you may want to use the
 > accession number or the GI number.&nbsp; However either of these would
 > have been encoded as
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL gid
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL accession
 > <br>This raised the problem of differentiating whether an identifier is
 > a GI number or an accession number.&nbsp; If the namespaces of the accession
 > number and gi numbers overlap, then there is no way for a server or client
 > to identify which form was intended.
 > <p>Another weakness is that object type can be overloaded in the same manner;&nbsp;
 > for example a sequence and a contig are both sequences.&nbsp; Similarly,
 > inclusion of a version number for an identifier would require overloading
 > the value field of the MPI Id.
 > <p>Therefore it was found that limiting the unique identifier at three
 > elements was too few.&nbsp; There must be provision for extension.
 > <br>&nbsp;
 > <h4>
 > <a NAME="AGAVE"></a>AGAVE db_id</h4>
 > Doubltwist Inc. has found some of the same issues in their AGAVE product.&nbsp;
 > AGAVE defines an identifier called db_id in the AGAVE DTD file, an XML
 > format.
 > <p>The AGAVE db_id is defined as follows;
 > <blockquote><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!-- db_id is an identifier for an object in its source database.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!-- Attributes:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!-- id:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a data identifier
 > such as GenBank accession or PID.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!-- db_code:&nbsp; a code for the data source, e.g. GenBank
 > is "gb".&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; --></tt>
 > <br><tt>&lt;!-- version:&nbsp; version of the associated data.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > --></tt>
 > <br><tt>&lt;!ELEMENT db_id EMPTY></tt>
 > <br><tt>&lt;!ATTLIST db_id&nbsp; id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > CDATA&nbsp; #REQUIRED</tt>
 > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > version&nbsp; CDATA&nbsp; #IMPLIED</tt>
 > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > db_code&nbsp; CDATA&nbsp; #REQUIRED ></tt></blockquote>
 > In this format, the version is explicitly specified, but the weaknesses
 > remain of having insufficient scope to specify object types or variations
 > in the type of ID being specified (accession vs gi).
 > <br>&nbsp;
 > <h2>
 > <a NAME="I3C String"></a>I3C String Identifiers</h2>
 > For use within the I3C architecture, the existing identifier definiitons
 > was found to be inadequate to handle the breadth and scope of the possible
 > identifiers that would be required.&nbsp; The following sections detail
 > the requirements and spaecification of a new identifier format for use
 > within the I3C architecture.
 > <h3>
 > <a NAME="Requirements"></a>Requirements for the I3C String Identifier</h3>
 > The I3C architecture has the following syntactical and semantic requirements
 > for its identifiers;
 > <h4>
 > <a NAME="Syntactic"></a>Syntactic Requirements</h4>
 > 
 > <ol>
 > <li>
 > The identifier must be encodable in a string format</li>
 > 
 > <li>
 > The identifier must be extensible</li>
 > 
 > <li>
 > The identifier must uniquely identify one object</li>
 > 
 > <li>
 > The identifier must not require additional contextual information for evaluation</li>
 > </ol>
 > These requirements result from the need to transmit the identifier in an
 > XML format to and from web-services.&nbsp; By requiring that it can be
 > encoded as a string, it becomes possible to transmit identifiers via other
 > mechanisms as well.&nbsp; Also, as noted in the examples given above, the
 > identifier must be extensible to allow use with biological objects that
 > have not yet been defined.
 > <h4>
 > <a NAME="Semantic"></a>Semantic Requirements</h4>
 > For an Id to uniquely specify a biological object in a system, it needs
 > to include the following pieces of information;
 > <br>&nbsp;
 > <ol>
 > <li>
 > &nbsp;Authority :&nbsp; The name of the organization that has defined an
 > entity.</li>
 > 
 > <li>
 > &nbsp;Id Value : an alpha-numeric sequence that uniquely identifies an
 > object to its authority</li>
 > 
 > <li>
 > &nbsp;Namespace : one or more statements constraining the scope in which
 > an Id is evaluated</li>
 > 
 > <li>
 > &nbsp;Version&nbsp; : (optional) version number for an Id</li>
 > </ol>
 > As an example, the following uniquely identifies a sequence in Genbank,
 > <p>&nbsp;&nbsp;&nbsp; GenBank, Sequence, Accession J01636,&nbsp; version
 > 1
 > <p>With all these pieces of information we can uniquely identify a sequence.&nbsp;
 > Leaving off the version number we can get pretty close.&nbsp; Leaving out
 > any of the other bits of information makes it impossible to find the object
 > without a priori knowledge of the context.
 > <br>&nbsp;
 > <h2>
 > <a NAME="Specification"></a>Specification of the I3C String Identifier</h2>
 > To take advantage of existing work on unique identifiers,&nbsp; the I3C
 > technical Architecture working group has selected the World Wide Web Consortium's
 > (W3C) definition of a universal resource name (URN) as the basis for the
 > I3C String Identifier.&nbsp; For additional background on URNs, please
 > see Appendix A, "URN Reference", for the definiiton of a URN and reference
 > links.
 > <h4>
 > <a NAME="Web Centric Id:  URI,"></a><b>Web Centric Id:&nbsp; URI, URN</b></h4>
 > To summarize the IETF and W3C documents,&nbsp; a URI can be written as
 > having the following parts;
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme:namespace identifier://authority/path/.../pathN/value?queryterm#fragment
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > scheme and namespace identifier define the semantics of everything that
 > follows
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > authority defines the organization responsible for defining and managing
 > the namespace
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > path/.../pathN/ defines a subset of an authority's namespace
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > value is the last element in the path
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > queryterm indicates a post-processing directive
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > fragment defines a preprocessing directive or fragment within the scope
 > of the Id
 > <p>The adoption of the URN format should simplify integration with other
 > existing standards such as MAGE-ML which permit the use of URN identifiers.
 > <br>&nbsp;
 > <h3>
 > <a NAME="I3C String Identifier"></a>I3C String Identifier Definition</h3>
 > Given the definition of a URN above, we have defined the following syntax
 > for an I3C String identifier;
 > <p><tt>&nbsp;&nbsp;&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version></tt>
 > <p>The different parts of the identifier are delimited by colons ":".
 > <p>The elements of the identifier are as follows;
 > <ul>
 > <li>
 > scheme = urn</li>
 > 
 > <ul>
 > <li>
 > This specifies that the identifier is in URN format</li>
 > </ul>
 > 
 > <li>
 > namespace identifier = lsid</li>
 > 
 > <ul>
 > <li>
 > The I3C string identifier namespace identifier is defined as "Life Science
 > Identifier", or "lsid".</li>
 > </ul>
 > 
 > <li>
 > authority = &lt;authority></li>
 > 
 > <ul>
 > <li>
 > This portion uniquely identifies the organization and optionally the organizational
 > unit that has defined the namespace for the remaining porions of the identifier</li>
 > </ul>
 > 
 > <li>
 > namespace = &lt;namespace></li>
 > 
 > <ul>
 > <li>
 > a hierarchical namepace to scope the identifier value.&nbsp; The form and
 > content of this section is defined and managed by the authority</li>
 > </ul>
 > 
 > <li>
 > value = &lt;value></li>
 > 
 > <ul>
 > <li>
 > the unique identifier for an object within the namespace defined by an
 > authority</li>
 > </ul>
 > 
 > <li>
 > version = &lt;version></li>
 > 
 > <ul>
 > <li>
 > optional version information associated with the identifier value</li>
 > </ul>
 > </ul>
 > 
 > <h4>
 > <a NAME="Examples"></a>Examples</h4>
 > So for example, for the plate, identified by millennium as ID 12345 with
 > MPI ID&nbsp;&nbsp;&nbsp; "MP PL 12345"
 > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate:12345
 > <p>Since the authority is free to define any path that it wishes (provided
 > of course that it manages them),&nbsp; we may want to define the path section
 > for plates more fully to something like this
 > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
 > <p>We can now use expanded path information to deal with cases that required
 > type overloading in the MPI ID.&nbsp; For example
 > <br>&nbsp;&nbsp;&nbsp; (Accession)&nbsp;&nbsp;&nbsp; GB CL j01636 version
 > 1
 > <br>&nbsp;&nbsp;&nbsp; (GI)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 > GB CL 146575
 > <br>refer to the same object.&nbsp; These can now be encoded as
 > <p>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/accession:J01636:1
 > <br>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/gi:146575
 > <br>&nbsp;
 > <h3>
 > <a NAME="Appendix A, URN"></a>Appendix A, URN Reference</h3>
 > 
 > <p><br>Ref: http://www.w3.org/Addressing/,http://www.ietf.org/rfc/rfc2141.txt,
 > http://www.ietf.org/rfc/rfc2396.txt
 > <p>In the context of the web, there is already a definition for global
 > identifiers,&nbsp; the Uniform Resource Name.&nbsp; From
 > <br>http://www.ietf.org/rfc/rfc2141.txt
 > <blockquote>Uniform Resource Names (URNs) are intended to serve as persistent,
 > <br>location-independent, resource identifiers and are designed to make
 > <br>it easy to map other namespaces (which share the properties of URNs)
 > <br>into URN-space. Therefore, the URN syntax provides a means to encode
 > <br>character data in a form that can be sent in existing protocols,
 > <br>transcribed on most keyboards, etc.</blockquote>
 > URIs are the superset of URNs and URLs.&nbsp; URL's are familiar due to
 > their use on the web. They differ from URNs in that they are scoped to
 > a particular protocol (e.g. http:*, ftp:* etc).&nbsp; URN's are scoped
 > simply as identifiers urn:*.
 > <p>URNs are divided into two parts,
 > <br>&nbsp;&nbsp;&nbsp; &lt;scheme> : &lt;scheme specific part >
 > <br>e.g. http://www.mpi.com/index.html,&nbsp; <b>http</b> is the scheme,&nbsp;
 > and <b>www.mpi.com/index.html </b>is the scheme specific part that is interpreted
 > in the context of that scheme.
 > <br>&nbsp;
 > <blockquote>The URI syntax does not require that the scheme-specific-part
 > have&nbsp; any general structure or set of semantics which is common among
 > all URI.&nbsp; However, a subset of URI do share a common syntax for&nbsp;
 > representing hierarchical relationships within the namespace.&nbsp; This
 > "generic URI" syntax consists of a sequence of four main components:
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;scheme>://&lt;authority>&lt;path>?&lt;query>#fragment
 > <p>each of which, except &lt;scheme>, may be absent from a particular URI.&nbsp;&nbsp;
 > For example, some URI schemes do not allow an &lt;authority> component,&nbsp;
 > and others do not use a &lt;query> component.
 > <p>&nbsp;&nbsp;&nbsp;&nbsp; absoluteURI&nbsp;&nbsp; = scheme ":" ( hier_part
 > | opaque_part )
 > <p>&nbsp; URI that are hierarchical in nature use the slash "/" character
 > for&nbsp; separating hierarchical components.&nbsp; For some file systems,
 > a "/"&nbsp; character (used to denote the hierarchical structure of a URI)
 > is the&nbsp; delimiter used to construct a file name hierarchy, and thus
 > the URI&nbsp; path will look similar to a file pathname.&nbsp; This does
 > NOT imply that the resource is a file or that the URI maps to an actual
 > filesystem pathname.
 > <p>[snip]
 > <p>The path component contains data, specific to the authority (or the
 > scheme if there is no authority component), identifying the resource within
 > the scope of that scheme and authority.
 > <p>[snip]
 > <p>When a URI reference is used to perform a retrieval action on the identified
 > resource, the optional fragment identifier, separated from the URI by a
 > crosshatch ("#") character, consists of additional reference information
 > to be interpreted by the user agent after the retrieval action has been
 > successfully completed.&nbsp; As such, it is not&nbsp; part of a URI, but
 > is often used in conjunction with a URI.
 > <p>(http://www.ietf.org/rfc/rfc2396.txt)</blockquote>
 > So to sum up the IETF stuff,&nbsp; a URI can be written as having all of
 > the following parts;
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme://authority/path/path2?queryterm=something#fragment
 > <br>&nbsp;
 > <br>&nbsp;
 > <h3>
 > <a NAME="Appendix B, Some example"></a>Appendix B, Some example identifiers</h3>
 > Here are some examples of identifiers written in this format;
 > <p>GenBank:&nbsp; the sequence fo J01636 could be identified as follows;
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636:1
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:K01483
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/gi:146575
 > <p>The associated protein could be referred to as follows;
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov/protein/locus/AAA24054
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/accession/AAA24054.1
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/pid/g146578
 > <br>&nbsp;
 > <p>Another example is the following nucleotide from EMBL
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092
 > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092:1
 > <p>This includes a reference to a taxonomy term
 > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:taxonomy.ebi.ac.uk::10090
 > <br>&nbsp;
 > <br>&nbsp;
 > <h3>
 > <a NAME="Appendix C, Additional"></a>Appendix C, Additional Work</h3>
 > 1. More clearly define what is authority and what is path.&nbsp; E.g. should
 > GenBank be part of the authority string or is it a part of a path beneath
 > ncbi.nlm.nih.gov.
 > <p>2. Since path terms are owned by the authority, get common definitions
 > for authorities/databases such as GenBank, EMBL etc.&nbsp; This could be
 > defined by us and presented to the organization in question for ratification.&nbsp;
 > Entities that do not make IDs publicly available are responsible for themselves
 > and their customers only but would benefit from a set of guidelines and
 > examples.
 > <p>3. Examine use cases in proteomics and other branches of informatics.
 > <p>4. Create libraries (java, perl) for manipulating IDs in this form.
 > <br>&nbsp;
 > <br>&nbsp;
 > <br>&nbsp;
 > <br>&nbsp;
 > </body>
 > </html>

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY

NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS. 
PLEASE WRITE FOR DETAILS.
========================================================================