[DAS] Re: Our identifier doc and proposal

Erich Stahl erich@genome.wi.mit.edu
Wed, 28 Nov 2001 16:05:31 -0500


Lincoln Stein wrote:

> Hi Brian,
>
> I'm happy we're having this discussion.  This brings us back to where
> we were on Monday, when I suggested that when a data source refers to
> a biological object, it give both the object's identifier and its
> object class.  As long as an identifier is globally unique I like it
> just fine, and that includes opaque strings.  However, the class
> information should (often) go along with it when you send it around.
> That way, when an application requests the object underlying the ID,
> it knows what to expect.  I prefer to send the class information as a
> separate field.
>

The pairing of OID and OID_CLASS represent discrete, fixed
minimum requirements for preservation of information content.   There are
times when this class definition will be inappropriate or inadequate, however.
The scheme assumes a canonical class that fits all our contexts of use, but it
is relatively easy to imagine examples where the context of use morphs our
object class for the given object.

This brings us back around to the need to reference at some level an ontology(s)
(our object in a given context ) either by the addition of a third piece of
information
as a handle to an ontology or possibly by using these pairings of OID/OID_CLASS
as a key to decode the path to an appropriate ontology access level.  A fairly
unattractive option is to invert this model and have a separate object represent each
of these contexts which unfortunately mostly serves to confuse the notion of which
object is being addressed and  unnecessarily fixes the meta-definition of the
object to only those classes described to date.


>
> I'd remark that the I3C proposal has left you wide open for abuse of
> the ID strings, because the examples make the path look like an object
> class and it will be treated/abused as such immediately.  Better to
> choose more obscure path examples.

I had this same thought when looking at the examples.  The path suggests
meaning where no meaning should be implied.   A main value of namespaces is that
they do not need to be organized to map to anything in particular.   I’d at least also

include ‘obscure’ path examples to emphasize that structure is not intended by
the naming scheme.

    - Erich

>
> Lincoln
>
> Brian King writes:
>  >
>  >  >    urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345
>  > >
>  > > I think the top level namespace, e.g. "plate" should be hard-and-fast
>  > > data types.  Is this envisioned by the I3C?
>  > >
>  > > Lincoln
>  >
>  > Lincoln,
>  >
>  > I'm one of the authors of the I3C ID proposal.  We've just started
>  > discussing the ontology-in-ID issue, so take my answer as only my own point
>  > of view. I believe we should not encode information such as object type in
>  > the I3C IDs.  An ID should just be a unique string, and nothing more.  Since
>  > ID creation is decentralized in the I3C interoperability architecture we
>  > need to encode an authority and namespace to ensure uniqueness, but that's
>  > the only purpose of that information.  The problem with having information
>  > encoded into IDs is that developers immediately start parsing this
>  > information to speed queries or map data to physical locations, or worse,
>  > create their IDs based on such assumptions.  Such schemes are unreliable and
>  > unstable.  They're unreliable because different paths in the ID imply
>  > different processing, and since IDs are passed everywhere, the logic for
>  > this processing gets scattered all across the system.  They are unstable
>  > because the information that seemed relevant in the last system becomes
>  > obsolete.  I have been a software contractor on several projects, and often
>  > the first task on entering a project is untangling the IDs.  We should avoid
>  > turning the ID into a query language in itself.
>  >
>  > Regards,
>  > Brian King
>  > DoubleTwist, Inc
>  > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
>  > <HTML><HEAD>
>  > <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
>  >
>  >
>  > <META content="MSHTML 5.00.2314.1000" name=GENERATOR></HEAD>
>  > <BODY>
>  > <DIV><FONT face=Arial size=2><BR>&nbsp;&gt;&nbsp;&nbsp;&nbsp;
>  > urn:lsid:informatics.mpi.com:plate/glycerol/freeze:12345<BR>&gt;&nbsp;<BR> &gt;
>  > I think the top level namespace, e.g. "plate" should be hard-and-fast<BR>&gt;
>  > data types.&nbsp; Is this envisioned by the I3C?<BR>&gt;&nbsp;<BR> &gt;
>  > Lincoln<BR></FONT></DIV>
>  > <DIV><FONT face=Arial size=2><SPAN
>  > class=080191403-28112001>Lincoln,</SPAN></FONT></DIV>
>  > <DIV><FONT face=Arial size=2><SPAN
>  > class=080191403-28112001></SPAN></FONT>&nbsp;</DIV>
>  > <DIV><SPAN class=080191403-28112001><FONT face=Arial size=2>I'm one of the
>  > authors of the I3C ID proposal.&nbsp; We've just started discussing the
>  > ontology-in-ID issue,&nbsp;so take&nbsp;my answer as only my own point of view.
>  > I believe we should not encode information such as object type in the I3C
>  > IDs.&nbsp; An ID should just be a unique string, and nothing more.&nbsp;
>  > Since&nbsp;ID creation is decentralized in the I3C interoperability
>  > architecture&nbsp;we need to encode an authority and namespace to ensure
>  > uniqueness, but that's the only purpose of that information.&nbsp; The problem
>  > with having information encoded into IDs is that developers immediately
>  > start&nbsp;parsing this information to speed queries or map data to physical
>  > locations, or worse, create their IDs based on such assumptions.&nbsp; Such
>  > schemes are unreliable and unstable.&nbsp; They're unreliable because different
>  > paths in the ID imply different processing, and&nbsp;since IDs are passed
>  > everywhere, the logic for this processing gets scattered all across the
>  > system.&nbsp; They are&nbsp;unstable because&nbsp;the information that seemed
>  > relevant&nbsp;in the last system becomes obsolete.&nbsp; I have been a software
>  > contractor on several projects, and often the first task on entering a project
>  > is untangling the IDs.&nbsp; We should avoid turning the ID into a query
>  > language in itself.</FONT></SPAN></DIV>
>  > <DIV><SPAN class=080191403-28112001><FONT face=Arial
>  > size=2></FONT></SPAN>&nbsp;</DIV>
>  > <DIV><FONT face=Arial size=2><SPAN
>  > class=080191403-28112001>Regards,</SPAN></FONT></DIV>
>  > <DIV><SPAN class=080191403-28112001><FONT face=Arial size=2>
>  > <DIV><FONT face=Arial size=2>Brian King</FONT></DIV>
>  > <DIV><FONT face=Arial size=2>DoubleTwist,
>  > Inc</FONT></FONT></SPAN></DIV></DIV></BODY></HTML>
>
> --
> ========================================================================
> Lincoln D. Stein                           Cold Spring Harbor Laboratory
> lstein@cshl.org                                   Cold Spring Harbor, NY
>
> NOW HIRING BIOINFORMATICS POSTDOCTORAL FELLOWS AND PROGRAMMERS.
> PLEASE WRITE FOR DETAILS.
> ========================================================================
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das

--
Erich Stahl <erich@genome.wi.mit.edu>
Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1532 / fax +1 617 252 1902