[MOBY-l] Re: bioMoby - name space (fwd)

Fri Feb 13 18:45:03 UTC 2004

Now that I have the context after speaking to Martin, I only have a couple 
of things to add. Firstly, Marks comments to Martin and Boris, as pasted 
below are exactly right.

--------------start Mark's comments -------------------------

My comments after sleeping on the problem overnight are as follows:

The important point that I think is being misunderstood here is that we
are talking about LSID's that *represent namespaces*, NOT LSID's that
represent data within that namespace!!  As such, biomoby.org is
*absolutely* the correct authority for namespace LSID's, since
biomoby.org controls the namespace ontology (as of last week!).

I discuss this point in various places, but in particular it is
discussed in my latest paper (get it from the biomoby.org website).  In
there I have a paragraph clarifying the problem that MOBY has with using
LSID's "verbatim" (the crux being that LSID's are opaque and DO NOT
represent underlying data-types without resolution - where the knowledge
of the underlying data-type is critical to service discovery).  I think
we are confusing the issue by conflating an identifier WITHIN a
namespace - for which Boris' argument is absolutely correct and I don't
think anyone could argue with him - with an identifier that REPRESENTS a
namespace, for which MOBY is the only authority that exists.

urn:lsid:biomoby.org:namespacetype:DragonDB_Allele

vs.

urn:lsid:antirrhinum.org:allele_id:AG113328

I believe that this is the misunderstanding that is leading to this
argument, and that once this is clear, the argument goes away.  If not,
then please read my paragraph about it in the manuscript as I deal with
it more comprehensively there.

If you all could give me the OK I will forward this message back to the
public list and we can finish the conversation there.

Thanks!

Mark

--
Mark Wilkinson <markw at illuminae.com>
---------------------------------- end of Mark's comments 
----------------------------------

Secondly, I can shed light the (I believe unrelated) LSID authority naming 
"convention" that Martin was trying to get out of me in his first note. In 
the case where one party needs LSID's for a particular data source that 
they do not own, but which is available through some non-LSID based means 
- perhaps via the web in real-time or perhaps as a download of the latest 
database from that data source, they may if they choose, assign LSID's 
under their own authority for that 3rd party data. The authority string 
naming convention we chose to do this at an I3C hackathon was really 
simple. All one needs to do is take the authority name that one thinks 
would be a logical authority name for the data source you are interested 
in, and concatenated it with your own authority name (in this case it was 
lsid.i3c.org). Here follow some (working and resolvable) example LSID's 
assigned by the I3C during that hackathon:

urn:lsid:ensembl.org.lsid.i3c.org:homosapiens_ref:234325  -  uses the 
ensemble library to contact ensemble.orgt
urn:lsid:ncbi.nlm.gov.lsid.i3c.org:pubmed:11385576 - proxies the request 
to the NCBI machines using the XML interface
urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:bm872070 - proxies the 
request to the NCBI machines using the XML interface

We made entries in the I3C DNS for  ncbi.nlm.gov.lsid.i3c.org, 
ebi.ack.uk.lsid.i3c.org, ncbi.nlm.nih.gov.lsid.i3c.org and attached the 
appropriate SRV records to them, as described  in section 8.3 in the OMG's 
LSID proposed specification document ( 
http://www.omg.org/docs/lifesci/03-12-02.pdf ). The resolution process 
followed is to resolve these in exactly the same was as for any other 
LSID. There is no magic attempt to break up the authority string into two 
authorities. Consumers of these LSID's can simply look at them to see 
where the data is actually coming from - in this case clearly the I3C is 
the authority and is in the end responsible for the delivery of the 
associated data using whatever means it like (proxy to the original 
source, local copy of the original source etc). It is likely folks will 
see more of these "third party" LSID's in the next months as they provide 
LSID access & intergration features to information ahead of the 'OEM' data 
provider being ready to supply their own "native" LSID's under their own 
authority string.

I hope this is clearer, but if there are further questions, please post or 
email me.

Thanks, Sean

--
Sean Martin
IBM Corp.

Martin Senger <senger at ebi.ac.uk> 
Sent by: moby-l-bounces at portal.open-bio.org
02/13/2004 05:47 AM

To
moby-l at biomoby.org
cc

Subject
[MOBY-l] Re: bioMoby - name space (fwd)

These are some opinions on the name space problem we had discussed in this
list from the main LSID promoter/author, Sean Martin from IBM. I am
expecting other email from him regarding "faking" the authority field - I
will post it when I have it...

Martin

--
Martin Senger

EMBL Outstation - Hinxton                Senger at EBI.ac.uk
European Bioinformatics Institute        Phone: (+44) 1223 494636
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

---------- Forwarded message ----------
Date: Thu, 12 Feb 2004 09:27:17 -0500
From: Sean Martin <sjmm at us.ibm.com>
To: Martin Senger <senger at ebi.ac.uk>
Cc: Boris Steipe <boris.steipe at utoronto.ca>, markw_mobile2 at illuminae.com,
Benjamin H Szekely <bhszekel at us.ibm.com>, Alyssa Wolf 
<alyssaw at us.ibm.com>,
Jordi A Albornoz <jordi at us.ibm.com>
Subject: Re: bioMoby - name space

Hi Martin,
The way we have being thinking about it here is as follows (this is not to
say it could not be extended as things move forward of course):

Take a lsid  ->  urn:lsid:pdb.org:1aft-qwertyfoo:1

In this case, when I resolve it, I get pointers to the data named by the
LSID as well as any meta-data that may be associated with it from the PDB.
The rule we have followed is that the only authoritative place to get data
(actual bytes named) for that LSID would be the owning authority - in this
case the PDB. However *any* resolver (including the PDB) may have also
have pointers to meta-data related to that LSID.  Each organization that
has meta-data is authoritative for the meta-data that it supplies. It is
up to the client that retrieves meta-data to decide what weight (i.e. just
how good is this meta-data?) to give a piece of meta-data depending on
which authorities (the source of that bit of meta-data) provided it to
them.  Naturally the weight for meta-data from the authority that actually
is authoritative for the data would be high - although perhaps not as high
as my own local database which gives me the information that in my own
testing I (for instance) found an error in the official/original
authorities data/meta-data. The client is free to ask any number of
authorities (there is no mechanism yet for the discover of these third
party resolvers - you just have to know them) if they have meta-data for
any particular LSID. Meta-data from third part authorities is usually not
the same meta-data that the "original naming" authority (in this case the
PDB) lists for that LSID. So biomoby.org is quite at liberty to hold its
own meta-data records for pdb.org or any other authority issued ID's.
Biomoby clients would be programmed to know to query both the actual
authority listed in the LSID and the biomoby resolver any time LSID
resolution is performed.

This mechanism allows third parties to attach meta-data/or otherwise
assert facts about data stored in another authority e.g annotation.  In
fact in our LSID resolver proxy implementations, I believe you can supply
a list of 3rd party resolvers that you want queried any time an LSID is
resolved. This allows things like checking a bunch of internal databases,
or the databases of "friends & colleagues" or other source important to
your research, for data related to any LSID you happen to be resolving.

I hope this is reasonably clear. Please ask questions if not.

Kindest regards, Sean
:

Martin Senger <senger at ebi.ac.uk>
02/12/2004 04:18 AM

To
Sean Martin/Cambridge/IBM at IBMUS
cc
Boris Steipe <boris.steipe at utoronto.ca>, <markw_mobile2 at illuminae.com>
Subject
bioMoby - name space

Sean,
Yesterday there was a discussion on the BioMoby list about authority
field of the LSID - just in case you are not on that list, here is a very
brief summary - because I have asked there for your expertise:

1) Boris asked:

"What happened to the idea of namespacing like in LSIDs i.e. making a
namespace valid only in the context of its issuing authority and thus
effectively using the (working, sort of) ICANN resolution mechanism to
ensure that namespaces remain unique ?"

2) Mark replied:

"yes, that was the plan... and in effect it still *is* the plan. However,
I cannot assign LSID's arbitrarily to another authority (this is something
they must do on their own), so for the moment all namespace identifiers
are in the "biomoby.org" LSID authority, and thus we have to be careful of
collisions.
Once LSID's become more widespread we will certainly stop using our own
LSID authority prefix and use the "genuine" one, as assigned by the true
naming authority, but until then... we're stuck."

3) I have said:

"Just my 2c:
The authority field is not important only as a way how to identify
things world-wide uniquely (if it was only for that, all your arguments
are completely valid) - but it *may* (the LSID spec does not mandate that
but suggests it) be used also for finding an appropriate resolution
service that can return data identified by this LSID. Therefore, if you
put there pubmed.org, it may never find biomoby.org where it can be
resolve.
I think the solution may be (Sean, are you listening here? Am I right?)
to have *both* in the authority - biomoby.org *and* pubmed.org - so the
resolution software will find first biomoby.org and it knows that the rest
of the authority can be ignored.
"

4) Finally Mark concluded:

"Cool - that's a trick I hadn't known about.  What is the separator
character between the two authority identifiers?"

Now, Sean, your opinion? Many thanks,
Martin

--
Martin Senger

EMBL Outstation - Hinxton                Senger at EBI.ac.uk
European Bioinformatics Institute        Phone: (+44) 1223 494636
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

_______________________________________________
moby-l mailing list
moby-l at biomoby.org
http://biomoby.org/mailman/listinfo/moby-l