[DAS] Re: Our identifier doc and proposal

Lincoln Stein lstein@cshl.org
Wed, 28 Nov 2001 13:41:44 -0500


Hi Brian,

I'm happy we're having this discussion.  This brings us back to where
we were on Monday, when I suggested that when a data source refers to
a biological object, it give both the object's identifier and its
object class.  As long as an identifier is globally unique I like it
just fine, and that includes opaque strings.  However, the class
information should (often) go along with it when you send it around.
That way, when an application requests the object underlying the ID,
it knows what to expect.  I prefer to send the class information as a
separate field.

I'd remark that the I3C proposal has left you wide open for abuse of
the ID strings, because the examples make the path look like an object
class and it will be treated/abused as such immediately.  Better to
choose more obscure path examples.

Lincoln

Brian King writes:
 > 
 >  >    urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
 > > 
 > > I think the top level namespace, e.g. "plate" should be hard-and-fast
 > > data types.  Is this envisioned by the I3C?
 > > 
 > > Lincoln
 > 
 > Lincoln,
 >  
 > I'm one of the authors of the I3C ID proposal.  We've just started
 > discussing the ontology-in-ID issue, so take my answer as only my own point
 > of view. I believe we should not encode information such as object type in
 > the I3C IDs.  An ID should just be a unique string, and nothing more.  Since
 > ID creation is decentralized in the I3C interoperability architecture we
 > need to encode an authority and namespace to ensure uniqueness, but that's
 > the only purpose of that information.  The problem with having information
 > encoded into IDs is that developers immediately start parsing this
 > information to speed queries or map data to physical locations, or worse,
 > create their IDs based on such assumptions.  Such schemes are unreliable and
 > unstable.  They're unreliable because different paths in the ID imply
 > different processing, and since IDs are passed everywhere, the logic for
 > this processing gets scattered all across the system.  They are unstable
 > because the information that seemed relevant in the last system becomes
 > obsolete.  I have been a software contractor on several projects, and often
 > the first task on entering a project is untangling the IDs.  We should avoid
 > turning the ID into a query language in itself.
 >  
 > Regards,
 > Brian King
 > DoubleTwist, Inc
 > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 > <HTML><HEAD>
 > <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
 > 
 > 
 > <META content="MSHTML 5.00.2314.1000" name=GENERATOR></HEAD>
 > <BODY>
 > <DIV><FONT face=Arial size=2><BR>&nbsp;&gt;&nbsp;&nbsp;&nbsp; 
 > urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345<BR>&gt;&nbsp;<BR> &gt; 
 > I think the top level namespace, e.g. "plate" should be hard-and-fast<BR>&gt; 
 > data types.&nbsp; Is this envisioned by the I3C?<BR>&gt;&nbsp;<BR> &gt; 
 > Lincoln<BR></FONT></DIV>
 > <DIV><FONT face=Arial size=2><SPAN 
 > class=080191403-28112001>Lincoln,</SPAN></FONT></DIV>
 > <DIV><FONT face=Arial size=2><SPAN 
 > class=080191403-28112001></SPAN></FONT>&nbsp;</DIV>
 > <DIV><SPAN class=080191403-28112001><FONT face=Arial size=2>I'm one of the 
 > authors of the I3C ID proposal.&nbsp; We've just started discussing the 
 > ontology-in-ID issue,&nbsp;so take&nbsp;my answer as only my own point of view. 
 > I believe we should not encode information such as object type in the I3C 
 > IDs.&nbsp; An ID should just be a unique string, and nothing more.&nbsp; 
 > Since&nbsp;ID creation is decentralized in the I3C interoperability 
 > architecture&nbsp;we need to encode an authority and namespace to ensure 
 > uniqueness, but that's the only purpose of that information.&nbsp; The problem 
 > with having information encoded into IDs is that developers immediately 
 > start&nbsp;parsing this information to speed queries or map data to physical 
 > locations, or worse, create their IDs based on such assumptions.&nbsp; Such 
 > schemes are unreliable and unstable.&nbsp; They're unreliable because different 
 > paths in the ID imply different processing, and&nbsp;since IDs are passed 
 > everywhere, the logic for this processing gets scattered all across the 
 > system.&nbsp; They are&nbsp;unstable because&nbsp;the information that seemed 
 > relevant&nbsp;in the last system becomes obsolete.&nbsp; I have been a software 
 > contractor on several projects, and often the first task on entering a project 
 > is untangling the IDs.&nbsp; We should avoid turning the ID into a query 
 > language in itself.</FONT></SPAN></DIV>
 > <DIV><SPAN class=080191403-28112001><FONT face=Arial 
 > size=2></FONT></SPAN>&nbsp;</DIV>
 > <DIV><FONT face=Arial size=2><SPAN 
 > class=080191403-28112001>Regards,</SPAN></FONT></DIV>
 > <DIV><SPAN class=080191403-28112001><FONT face=Arial size=2>
 > <DIV><FONT face=Arial size=2>Brian King</FONT></DIV>
 > <DIV><FONT face=Arial size=2>DoubleTwist, 
 > Inc</FONT></FONT></SPAN></DIV></DIV></BODY></HTML>

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY

NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS. 
PLEASE WRITE FOR DETAILS.
========================================================================