[DAS] Re: Our identifier doc and proposal

Brian Gilman gilmanb@genome.wi.mit.edu
Tue, 27 Nov 2001 15:52:40 -0500 (EST)


I think so and I have also asked about this in the group. it becomes very
hard to "control" the namespace without an ontology. This is why we allow
the individuals to control the top level. 

			-B

-----------------------
Brian Gilman <gilmanb@genome.wi.mit.edu>
Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1069 / fax +1 617 252 1902


On Tue, 27 Nov 2001, Lincoln Stein wrote:

> Hi Brian,
> 
> I'm pleased to see that the I3C identifier is nearly identical to my
> (biological class,namespace,id) triple suggestion.  The difference is
> the version number, which I agree with completely.  So I accept it
> wholeheartedly.
> 
> The part that I don't feel entirely comfortable with is that the
> namespace seems to be completely under the control of the authority:
> 
>    urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
> 
> I think the top level namespace, e.g. "plate" should be hard-and-fast
> data types.  Is this envisioned by the I3C?
> 
> Lincoln
> 
> 
> Brian Gilman writes:
>  > Lincoln,
>  > 
>  > 	Please find attached an updated identifier proposal that we have
>  > been working
>  > on to identifiy objects in the web services architecture. I like it over
>  > the feature_class mechanism becuase we can uniquely identify an object in
>  > the "cloud".
>  > 
>  > 		Best, 
>  > 
>  > 			-Brian
>  > 
>  > -----------------------
>  > Brian Gilman <gilmanb@genome.wi.mit.edu>
>  > Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
>  > One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
>  > phone +1 617  252 1069 / fax +1 617 252 1902
>  > 
>  > 
>  > <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
>  > <html>
>  > <head>
>  >    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
>  >    <meta name="Author" content="Ted liefeld">
>  >    <meta name="GENERATOR" content="Mozilla/4.73 [en]C-CCK-MCD BA45DSL  (WinNT; U) [Netscape]">
>  >    <title>identifiers</title>
>  > </head>
>  > <body>
>  > 
>  > <h2>
>  > I3C Identifier Specification</h2>
>  > 
>  > <h3>
>  > <a NAME="Abstract"></a>Abstract:</h3>
>  > This document describes the motivation for and specification of string
>  > identifiers to be used to identify objects within the life sciences domain
>  > by the I3C architecture.&nbsp; A string format for the identifiers is defined
>  > as&nbsp;<tt>&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version>.</tt>
>  > <br>&nbsp;
>  > <h2>
>  > Index:</h2>
>  > <a href="#Abstract">Abstract</a>
>  > <br><a href="#Introduction:">Introduction</a>
>  > <br><a href="#Background: Existing">Background: Existing Identifiers</a>
>  > <blockquote><a href="#MPI">MPI Id</a>
>  > <br><a href="#AGAVE">AGAVE db_id</a></blockquote>
>  > <a href="#I3C String">I3C String Identifiers</a>
>  > <blockquote><a href="#Requirements">Requirements for the I3C String Identifier</a>
>  > <blockquote><a href="#Syntactic">Syntactic Requirements</a>
>  > <br><a href="#Semantic">Semantic Requirements</a></blockquote>
>  > </blockquote>
>  > <a href="#Specification">Specification of the I3C String Identifier</a>
>  > <blockquote><a href="#Web Centric Id:  URI,">Web Centric Id:&nbsp; URI,
>  > URN</a>
>  > <br><a href="#I3C String Identifier">I3C String Identifier Definition</a>
>  > <br><a href="#Examples">Examples</a></blockquote>
>  > <a href="#Appendix A, URN">Appendix A, URN Reference</a>
>  > <br><a href="#Appendix B, Some example">Appendix B, Some example identifiers</a>
>  > <br><a href="#Appendix C, Additional">Appendix C, Additional Work</a>
>  > <h2>
>  > <a NAME="Introduction:"></a>Introduction:</h2>
>  > One of the goals of the I3c is the definition of a common architecture
>  > and standards to simplify interoperability between applications from different
>  > companies.&nbsp; For interoperability to occur,&nbsp; we need a common
>  > format for unique identifiers for any objects we reference that would function
>  > in the context of I3C services.&nbsp; The remainder of this document defines
>  > a web-centric ID definition that will allow us to create federated systems
>  > utilizing many databases and services in a common way.
>  > <p>The purpose of this identifier definition is to uniquely identify biologically
>  > significant objects,&nbsp; e.g. a sequence, a clone, a gene, a contig etc.
>  > It is not meant to identify artifacts of implementation, e,g, a database,
>  > a server.&nbsp; Identifying objects such as these should be handled via
>  > other mechanisms such as JDBC URLs and WSDL.
>  > <p>In addition, using http URLs as identifiers (e.g. http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-id+7clen1HOYrs+[taxonomy-ID:10090]+-e)
>  > is not adequate for&nbsp; organizations who need to limit or control access
>  > to external databases for intellectual property or security reasons.&nbsp;
>  > The identifiers defined here are deliberately location independent and
>  > are intended to uniquely identify a biological artifact, but not the location
>  > of that artifact.
>  > <br>&nbsp;
>  > <h3>
>  > <a NAME="Background: Existing"></a><b>Background: Existing Identifiers</b></h3>
>  > There are currently many existing forms of identifiers for biological artifacts
>  > in use within the life sciences community. These include proprietary formats
>  > as well as public domain formats.&nbsp; Some of these are discussed below.
>  > <br>&nbsp;
>  > <h4>
>  > <a NAME="MPI"></a>MPI Id</h4>
>  > Within Millennium Pharmaceuticals Inc.,&nbsp; there is a suite of CORBA
>  > services that have been running in production for over two years.&nbsp;
>  > One of the first tasks they addressed was the identity management of objects
>  > that appear in more than one database.&nbsp; To deal with this need, a
>  > CORBA IDL structure called&nbsp; MPI Id was declared.&nbsp; It has since
>  > been reused in many subsequent CORBA services.
>  > <p>The MPI ID corba type is defined as a triple
>  > <br><tt>&nbsp;&nbsp;&nbsp; struct Id {</tt>
>  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string value;</tt>
>  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string domain;</tt>
>  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string type;</tt>
>  > <br><tt>&nbsp;&nbsp;&nbsp; };</tt>
>  > <p>for example, MP PL 001 represents identifier domain 'MP', type 'Plate',
>  > identifier value '001'.&nbsp; The value uniquely identifies an object in
>  > the domain of a given type.&nbsp; Note that an object may have more than
>  > one ID, so that the plate known as MP PL 001 may also be known as SE PL
>  > 435a in the SE domain.
>  > <p>Some of the services that use these identifiers found this too limiting.&nbsp;
>  > For example,&nbsp; retrieving clones from GenBank you may want to use the
>  > accession number or the GI number.&nbsp; However either of these would
>  > have been encoded as
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL gid
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GB CL accession
>  > <br>This raised the problem of differentiating whether an identifier is
>  > a GI number or an accession number.&nbsp; If the namespaces of the accession
>  > number and gi numbers overlap, then there is no way for a server or client
>  > to identify which form was intended.
>  > <p>Another weakness is that object type can be overloaded in the same manner;&nbsp;
>  > for example a sequence and a contig are both sequences.&nbsp; Similarly,
>  > inclusion of a version number for an identifier would require overloading
>  > the value field of the MPI Id.
>  > <p>Therefore it was found that limiting the unique identifier at three
>  > elements was too few.&nbsp; There must be provision for extension.
>  > <br>&nbsp;
>  > <h4>
>  > <a NAME="AGAVE"></a>AGAVE db_id</h4>
>  > Doubltwist Inc. has found some of the same issues in their AGAVE product.&nbsp;
>  > AGAVE defines an identifier called db_id in the AGAVE DTD file, an XML
>  > format.
>  > <p>The AGAVE db_id is defined as follows;
>  > <blockquote><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!-- db_id is an identifier for an object in its source database.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!-- Attributes:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!-- id:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a data identifier
>  > such as GenBank accession or PID.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!-- db_code:&nbsp; a code for the data source, e.g. GenBank
>  > is "gb".&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; --></tt>
>  > <br><tt>&lt;!-- version:&nbsp; version of the associated data.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!--&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > --></tt>
>  > <br><tt>&lt;!ELEMENT db_id EMPTY></tt>
>  > <br><tt>&lt;!ATTLIST db_id&nbsp; id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > CDATA&nbsp; #REQUIRED</tt>
>  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > version&nbsp; CDATA&nbsp; #IMPLIED</tt>
>  > <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > db_code&nbsp; CDATA&nbsp; #REQUIRED ></tt></blockquote>
>  > In this format, the version is explicitly specified, but the weaknesses
>  > remain of having insufficient scope to specify object types or variations
>  > in the type of ID being specified (accession vs gi).
>  > <br>&nbsp;
>  > <h2>
>  > <a NAME="I3C String"></a>I3C String Identifiers</h2>
>  > For use within the I3C architecture, the existing identifier definiitons
>  > was found to be inadequate to handle the breadth and scope of the possible
>  > identifiers that would be required.&nbsp; The following sections detail
>  > the requirements and spaecification of a new identifier format for use
>  > within the I3C architecture.
>  > <h3>
>  > <a NAME="Requirements"></a>Requirements for the I3C String Identifier</h3>
>  > The I3C architecture has the following syntactical and semantic requirements
>  > for its identifiers;
>  > <h4>
>  > <a NAME="Syntactic"></a>Syntactic Requirements</h4>
>  > 
>  > <ol>
>  > <li>
>  > The identifier must be encodable in a string format</li>
>  > 
>  > <li>
>  > The identifier must be extensible</li>
>  > 
>  > <li>
>  > The identifier must uniquely identify one object</li>
>  > 
>  > <li>
>  > The identifier must not require additional contextual information for evaluation</li>
>  > </ol>
>  > These requirements result from the need to transmit the identifier in an
>  > XML format to and from web-services.&nbsp; By requiring that it can be
>  > encoded as a string, it becomes possible to transmit identifiers via other
>  > mechanisms as well.&nbsp; Also, as noted in the examples given above, the
>  > identifier must be extensible to allow use with biological objects that
>  > have not yet been defined.
>  > <h4>
>  > <a NAME="Semantic"></a>Semantic Requirements</h4>
>  > For an Id to uniquely specify a biological object in a system, it needs
>  > to include the following pieces of information;
>  > <br>&nbsp;
>  > <ol>
>  > <li>
>  > &nbsp;Authority :&nbsp; The name of the organization that has defined an
>  > entity.</li>
>  > 
>  > <li>
>  > &nbsp;Id Value : an alpha-numeric sequence that uniquely identifies an
>  > object to its authority</li>
>  > 
>  > <li>
>  > &nbsp;Namespace : one or more statements constraining the scope in which
>  > an Id is evaluated</li>
>  > 
>  > <li>
>  > &nbsp;Version&nbsp; : (optional) version number for an Id</li>
>  > </ol>
>  > As an example, the following uniquely identifies a sequence in Genbank,
>  > <p>&nbsp;&nbsp;&nbsp; GenBank, Sequence, Accession J01636,&nbsp; version
>  > 1
>  > <p>With all these pieces of information we can uniquely identify a sequence.&nbsp;
>  > Leaving off the version number we can get pretty close.&nbsp; Leaving out
>  > any of the other bits of information makes it impossible to find the object
>  > without a priori knowledge of the context.
>  > <br>&nbsp;
>  > <h2>
>  > <a NAME="Specification"></a>Specification of the I3C String Identifier</h2>
>  > To take advantage of existing work on unique identifiers,&nbsp; the I3C
>  > technical Architecture working group has selected the World Wide Web Consortium's
>  > (W3C) definition of a universal resource name (URN) as the basis for the
>  > I3C String Identifier.&nbsp; For additional background on URNs, please
>  > see Appendix A, "URN Reference", for the definiiton of a URN and reference
>  > links.
>  > <h4>
>  > <a NAME="Web Centric Id:  URI,"></a><b>Web Centric Id:&nbsp; URI, URN</b></h4>
>  > To summarize the IETF and W3C documents,&nbsp; a URI can be written as
>  > having the following parts;
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme:namespace identifier://authority/path/.../pathN/value?queryterm#fragment
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > scheme and namespace identifier define the semantics of everything that
>  > follows
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > authority defines the organization responsible for defining and managing
>  > the namespace
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > path/.../pathN/ defines a subset of an authority's namespace
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > value is the last element in the path
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > queryterm indicates a post-processing directive
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > fragment defines a preprocessing directive or fragment within the scope
>  > of the Id
>  > <p>The adoption of the URN format should simplify integration with other
>  > existing standards such as MAGE-ML which permit the use of URN identifiers.
>  > <br>&nbsp;
>  > <h3>
>  > <a NAME="I3C String Identifier"></a>I3C String Identifier Definition</h3>
>  > Given the definition of a URN above, we have defined the following syntax
>  > for an I3C String identifier;
>  > <p><tt>&nbsp;&nbsp;&nbsp; urn:lsid:&lt;authority>:&lt;namespace>:&lt;value>:&lt;version></tt>
>  > <p>The different parts of the identifier are delimited by colons ":".
>  > <p>The elements of the identifier are as follows;
>  > <ul>
>  > <li>
>  > scheme = urn</li>
>  > 
>  > <ul>
>  > <li>
>  > This specifies that the identifier is in URN format</li>
>  > </ul>
>  > 
>  > <li>
>  > namespace identifier = lsid</li>
>  > 
>  > <ul>
>  > <li>
>  > The I3C string identifier namespace identifier is defined as "Life Science
>  > Identifier", or "lsid".</li>
>  > </ul>
>  > 
>  > <li>
>  > authority = &lt;authority></li>
>  > 
>  > <ul>
>  > <li>
>  > This portion uniquely identifies the organization and optionally the organizational
>  > unit that has defined the namespace for the remaining porions of the identifier</li>
>  > </ul>
>  > 
>  > <li>
>  > namespace = &lt;namespace></li>
>  > 
>  > <ul>
>  > <li>
>  > a hierarchical namepace to scope the identifier value.&nbsp; The form and
>  > content of this section is defined and managed by the authority</li>
>  > </ul>
>  > 
>  > <li>
>  > value = &lt;value></li>
>  > 
>  > <ul>
>  > <li>
>  > the unique identifier for an object within the namespace defined by an
>  > authority</li>
>  > </ul>
>  > 
>  > <li>
>  > version = &lt;version></li>
>  > 
>  > <ul>
>  > <li>
>  > optional version information associated with the identifier value</li>
>  > </ul>
>  > </ul>
>  > 
>  > <h4>
>  > <a NAME="Examples"></a>Examples</h4>
>  > So for example, for the plate, identified by millennium as ID 12345 with
>  > MPI ID&nbsp;&nbsp;&nbsp; "MP PL 12345"
>  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate:12345
>  > <p>Since the authority is free to define any path that it wishes (provided
>  > of course that it manages them),&nbsp; we may want to define the path section
>  > for plates more fully to something like this
>  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
>  > <p>We can now use expanded path information to deal with cases that required
>  > type overloading in the MPI ID.&nbsp; For example
>  > <br>&nbsp;&nbsp;&nbsp; (Accession)&nbsp;&nbsp;&nbsp; GB CL j01636 version
>  > 1
>  > <br>&nbsp;&nbsp;&nbsp; (GI)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  > GB CL 146575
>  > <br>refer to the same object.&nbsp; These can now be encoded as
>  > <p>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/accession:J01636:1
>  > <br>&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:sequence/gi:146575
>  > <br>&nbsp;
>  > <h3>
>  > <a NAME="Appendix A, URN"></a>Appendix A, URN Reference</h3>
>  > 
>  > <p><br>Ref: http://www.w3.org/Addressing/,http://www.ietf.org/rfc/rfc2141.txt,
>  > http://www.ietf.org/rfc/rfc2396.txt
>  > <p>In the context of the web, there is already a definition for global
>  > identifiers,&nbsp; the Uniform Resource Name.&nbsp; From
>  > <br>http://www.ietf.org/rfc/rfc2141.txt
>  > <blockquote>Uniform Resource Names (URNs) are intended to serve as persistent,
>  > <br>location-independent, resource identifiers and are designed to make
>  > <br>it easy to map other namespaces (which share the properties of URNs)
>  > <br>into URN-space. Therefore, the URN syntax provides a means to encode
>  > <br>character data in a form that can be sent in existing protocols,
>  > <br>transcribed on most keyboards, etc.</blockquote>
>  > URIs are the superset of URNs and URLs.&nbsp; URL's are familiar due to
>  > their use on the web. They differ from URNs in that they are scoped to
>  > a particular protocol (e.g. http:*, ftp:* etc).&nbsp; URN's are scoped
>  > simply as identifiers urn:*.
>  > <p>URNs are divided into two parts,
>  > <br>&nbsp;&nbsp;&nbsp; &lt;scheme> : &lt;scheme specific part >
>  > <br>e.g. http://www.mpi.com/index.html,&nbsp; <b>http</b> is the scheme,&nbsp;
>  > and <b>www.mpi.com/index.html </b>is the scheme specific part that is interpreted
>  > in the context of that scheme.
>  > <br>&nbsp;
>  > <blockquote>The URI syntax does not require that the scheme-specific-part
>  > have&nbsp; any general structure or set of semantics which is common among
>  > all URI.&nbsp; However, a subset of URI do share a common syntax for&nbsp;
>  > representing hierarchical relationships within the namespace.&nbsp; This
>  > "generic URI" syntax consists of a sequence of four main components:
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;scheme>://&lt;authority>&lt;path>?&lt;query>#fragment
>  > <p>each of which, except &lt;scheme>, may be absent from a particular URI.&nbsp;&nbsp;
>  > For example, some URI schemes do not allow an &lt;authority> component,&nbsp;
>  > and others do not use a &lt;query> component.
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp; absoluteURI&nbsp;&nbsp; = scheme ":" ( hier_part
>  > | opaque_part )
>  > <p>&nbsp; URI that are hierarchical in nature use the slash "/" character
>  > for&nbsp; separating hierarchical components.&nbsp; For some file systems,
>  > a "/"&nbsp; character (used to denote the hierarchical structure of a URI)
>  > is the&nbsp; delimiter used to construct a file name hierarchy, and thus
>  > the URI&nbsp; path will look similar to a file pathname.&nbsp; This does
>  > NOT imply that the resource is a file or that the URI maps to an actual
>  > filesystem pathname.
>  > <p>[snip]
>  > <p>The path component contains data, specific to the authority (or the
>  > scheme if there is no authority component), identifying the resource within
>  > the scope of that scheme and authority.
>  > <p>[snip]
>  > <p>When a URI reference is used to perform a retrieval action on the identified
>  > resource, the optional fragment identifier, separated from the URI by a
>  > crosshatch ("#") character, consists of additional reference information
>  > to be interpreted by the user agent after the retrieval action has been
>  > successfully completed.&nbsp; As such, it is not&nbsp; part of a URI, but
>  > is often used in conjunction with a URI.
>  > <p>(http://www.ietf.org/rfc/rfc2396.txt)</blockquote>
>  > So to sum up the IETF stuff,&nbsp; a URI can be written as having all of
>  > the following parts;
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scheme://authority/path/path2?queryterm=something#fragment
>  > <br>&nbsp;
>  > <br>&nbsp;
>  > <h3>
>  > <a NAME="Appendix B, Some example"></a>Appendix B, Some example identifiers</h3>
>  > Here are some examples of identifiers written in this format;
>  > <p>GenBank:&nbsp; the sequence fo J01636 could be identified as follows;
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:J01636:1
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/accession:K01483
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov:nucleotide/gi:146575
>  > <p>The associated protein could be referred to as follows;
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genbank.ncbi.nlm.nih.gov/protein/locus/AAA24054
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/accession/AAA24054.1
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:genpept.ncbi.nlm.nih.gov/protein/pid/g146578
>  > <br>&nbsp;
>  > <p>Another example is the following nucleotide from EMBL
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092
>  > <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:embl.ebi.ac.uk:nucleotide:AB056092:1
>  > <p>This includes a reference to a taxonomy term
>  > <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; urn:lsid:taxonomy.ebi.ac.uk::10090
>  > <br>&nbsp;
>  > <br>&nbsp;
>  > <h3>
>  > <a NAME="Appendix C, Additional"></a>Appendix C, Additional Work</h3>
>  > 1. More clearly define what is authority and what is path.&nbsp; E.g. should
>  > GenBank be part of the authority string or is it a part of a path beneath
>  > ncbi.nlm.nih.gov.
>  > <p>2. Since path terms are owned by the authority, get common definitions
>  > for authorities/databases such as GenBank, EMBL etc.&nbsp; This could be
>  > defined by us and presented to the organization in question for ratification.&nbsp;
>  > Entities that do not make IDs publicly available are responsible for themselves
>  > and their customers only but would benefit from a set of guidelines and
>  > examples.
>  > <p>3. Examine use cases in proteomics and other branches of informatics.
>  > <p>4. Create libraries (java, perl) for manipulating IDs in this form.
>  > <br>&nbsp;
>  > <br>&nbsp;
>  > <br>&nbsp;
>  > <br>&nbsp;
>  > </body>
>  > </html>
> 
> -- 
> ========================================================================
> Lincoln D. Stein                           Cold Spring Harbor Laboratory
> lstein@cshl.org			                  Cold Spring Harbor, NY
> 
> NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS. 
> PLEASE WRITE FOR DETAILS.
> ========================================================================
>