[MOBY-l] Re: Re:[MOBY] Constructing MOBY objects

Mark Wilkinson markw at illuminae.com
Thu Jul 17 14:23:05 UTC 2003


Hi Beatrice,

> Keeping in mind that objects are to be kept generic, may I apply a 
> similar rule to object namespaces too, where possible? If so, may I 
> register instead a general Locus and Author namespace, for example, 
> without the prefix 'AGI_'.

What worries me about this is how it affects service discovery.  If you
register an Arabidopsis-specific service as accepting queries in the
"Locus" namespace, and I register an Antirrhinum service accepting
queries in the "Locus" namespace, then we end up sending meaningless
queries to each other's services.  

Although locus names and allele names *look* like plaintext, they don't
really function this way - they are really functioning like ID numbers. 
As such, they should be put into a specific namespace, since they are
only guaranteed a unique meaning within that namespace; there is
absolutely no connection between COT1 in Yeast and COT1 in Arabidopsis
(unless Yeast have suddenly developed Trichomes overnight ;-) ) so it is
really meaningless to have a generic "Locus" namespace.

Now, a similar situation exists with Author names - You represent my
authorship as "Wilkinson, MD", but in DragonDB I am listed as "Wilkinson
MD" (no comma); thus these are also acting as indexes and are only
guaranteed to be unique/meaningful within your database.  On the other
hand, there is a ~universally accepted format for representing author
names - we could use PubMed format, and create a generic author
namespace that is defined as having ID's that follow that format... but
that suffers from non-uniqueness as well.  PubMed does not know if the
Wilkinson MD who works in the Dept of Nuclear Medicine, Syndey, AU is
the same Wilkinson MD who worked on flower development at the Max Planck
Institute a few years ago...  I think we have to throw up our hands in
this case because there is little we can do about it.  Moreover, the
benefits we gain from being able to query PubMed outweigh the loss of
precision we suffer by using a generic and non-unique PubMed formatted
author namespace.  I guess until we start identifying authors by their
passport numbers we are stuck with this problem :-)  ...but let's not
call it "Author", because PubMed may have other namespaces, so let's
prefix it appropriately to get "PubMed_Author".

So... in summary:  let's keep things like Gene Names specific to a
database for the moment (i.e. AGI_Locus, or perhaps better
AGI_GeneSymbol, AGI_GeneName, AGI_LocusID), and let's register a 
"PubMed_Author" namespace that is guaranteed to follow PubMed format,
but is not guaranteed to be unique, and you can register an AGI_Author
as well if you want to use your own format for Author names.

As you said, an exponential increase in namespaces, but... I don't see
an alternative.  The interoperability that we lose (if any) by having a
lot of namespaces we can hopefully make up in the frequent use of
CrossReferences in our response objects.

Cheers!

Mark

-- 

Mark Wilkinson <markw at illuminae.com>
Illuminae




More information about the moby-l mailing list