[MOBY-l] Initiating a discussion of LSID and MOBY-Triples

mwilkinson at gene.pbi.nrc.ca mwilkinson at gene.pbi.nrc.ca
Wed Jul 17 17:25:39 UTC 2002


I want to start a thread of discussion here regarding the similarities &
differences between the LSID and MOBY-Triple.  It seems sensible to ensure
that we can use LSID's once they appear - if it is going to require drastic
changes, then we should probably make the changes now before we go too
far.  In addition, we (BioMOBY & I3C) seem to be using the same words, e.g.
"Authority", in different ways, so it is important to clarify our
terminology for the sake of the newcomers to the MOBY project who might be
confused.

Here's how I see things:

The MOBY triple has changed somewhat in meaning since the first MOBY-DIC
meeting.  It seems to me that we are, in many ways, approaching an
LSID-like model already.  The triple <type namespace id> is essentially an
"LSID" with instantiation information attached.  For example:

    <Sequence  namespace="Genbank/GI"  id="1437643">
        .....
    </Sequence>

is simply saying "take the piece of data identified by Genbank/GI 1437643
and represent it as a sequence object."  It could equally well be
represented as a VirtualSequence object, or a SequenceWithFeatures object.

We have most of the critical elements of the LSID in this triple:
        - Genbank    = I3C's "Authority"
        - GI         = I3C's "namespace"
        - id         = I3C's ID


Mapping our terminology onto theirs:

MOBY term     I3C term
---------     --------
Authority  -> (not included)
namespace  ->  Authority + namespace
id         ->  ID

In BioMOBY, the "Authority" is not part of the triple.  It is passed in the
MOBY envelope, and represents the "Authority" who is presenting you with
the enveloped data; it has no connection to the data itself.  In LSID's, as
I understand it, the "Authority" is the group who controls the namespace.
I believe that the I3C intends the Authority to be a URI of some sort
(Brian, is that correct?), wheras we are currently moving towards using the
GO "Abbreviations for cross-referenced databases" CV for our namespace
element.  The representation of the namespace does not alter the
functionality of MOBY, so in this regard moving towards the LSID
proposal would not hurt us.  It *might* cause problems, in that we are
dependent on a CV of MOBY Namespaces in order to do the service lookups at
MOBY Central - "who can give me information starting from a Genbank/GI?",
where the Service has registered that it understands the Genbank/GI
namespace.  I don't think it is the intention of the I3C to catalogue their
namespaces, so that becomes a MOBY-specific issue, but since we need the CV
in any case, I dont' see that it gives us any more or less work than we
have now.

The thing that we are missing is the optional security field, but security
is something we haven't really spent much time discussing anyway.  Perhaps
adopting LSID's would make this issue go away effortlessly?

So... how far apart are we, really?  The MOBY Triple breaks up the LSID
into separate attributes... no big deal - we could modify this with little
effort if it is strictly necessary to represent LSID data in the form of a
URN.  We include instantiation information, but that is not *part* of the
data, so that doesn't interfere with LSID.  We don't currently deal with
the problem of synonymous names for data, but it isn't clear to me how
LSID's do this either, so...??  And then there is security, which we don't
handle at all so far.

Overall, it seems to me that we are more or less LSID-ready as we stand.
However, given that the LSID specification is still in flux, and there are
few (any?) who are actually using the LSID at the moment, there is no
pressing need to jump up and change our Triple structure.  There are more
important MOBY issues to deal with at this stage - I figure we can make
these small transitions as the need arises.  The most important thing is
that we seem to be close enough already that the transition will not be
painful.

What do you all think?

M



--
--------------------------------
"Speed is subsittute fo accurancy."
________________________________

Dr. Mark Wilkinson, RA Bioinformatics
National Research Council, Plant Biotechnology Institute
110 Gymnasium Place, Saskatoon, SK, Canada

phone : (306) 975 5279
pager : (306) 934 2322
mobile: markw_mobile at illuminae.com





More information about the moby-l mailing list