[MOBY-l] Re: discussion of GO abbs's and Re: [MOBY] Constructing MOBY objects

Midori Harris midori at ebi.ac.uk
Wed Jul 9 10:52:01 UTC 2003


Hi,

Replying only to the bits where I could actually be much help....

On 8 Jul 2003, Mark Wilkinson wrote:

> On Thu, 2003-07-03 at 05:37, Beatrice Schildknecht wrote:
> 
> > As each object has a unique stock number and may not have an EMBL Acc, I 
> > would like to register the namespace, Stock No. (or something 
> > similar...).
> 
> I believe you are going ahead with this already, and have already
> registered this namespace with GO... right?

We don't have an entry in GO.xrf_abbs for NASC stock numbers. I'll happily
add one if someone could send me the bits to fill in any relevant
fields... see go/doc/GO.xrf_abbs_spec for the list, and ask me any
questions that come up.


> This is something that has come up as an issue in my mind over the past
> few days.  We need to quickly set a convention for ID's or things will
> become chaotic.  We have decided to use the GO abbreviations as our
> namespaces, but the more I look at that document the more concern I
> have.  For example, in the Gene Ontology abbreviations document they
> discuss "namespaces" (in the MOBY meaning of the word), and imply that
> these namespaces are prefixes for an ID number.  So an NCBI taxon id is
> written:
> 
> 	taxon:123
> 
> **BUT**, from my interpretation of the document, it isn't consistent
> from one identifier to the next (GO people, please comment on this if I
> am misunderstanding).  For example, an E.coli genetic stock center gene
> name, abbreviated "ECOGENE_G" is designated as "ECOGENE_G:deoC", but a
> Compugen GO Gene Accession, abbreviated "CGEN" does not use the prefix,
> and is writen "PrID131022" (moreover, in the GO database itself, even
> the PrID part of the identifier is apparently stripped off, and you get
> just the integer portion of the id).  Am I misreading the GO_xref_abbs
> document, or is it accidentally inconsistent, or is it purposely
> inconsistent?  Midori?

The example IDs in GO.xrf_abbs should all be written as ABBR:ID; the CGEN
entry is just a mistake of the sort that often creeps into hand-edited
files (and I'm about to fix it now that you've pointed it out). That's
simply the convention for entries in that file. I've been operating under
the impression that it would be perfectly reasonable to interpret

  abbreviation: taxon
  example: taxon:7227

as equivalent to <Object namespace='taxon' id='123'>. But I can't
guarantee that I'm right and won't be shouted down ;)

Chris (or someone else at BDGP) will have to comment on what happens to
the IDs in the GO database.

A tangential note: some of the entries in GO.xrf_abbs don't really
correspond to namespaces, but are other useful abbreviations that we've
stored in the file because it's a convenient place (and than was the
original purpose of the earliest incarnation of the file).

Midori

> 
> So anyway... what should we do in MOBY?  
> 
> ?	<Object namespace='taxon' id='taxon:123'> 
> ?	<Object namespace='taxon' id='123'> 
> 
> ...or do we make it flexible, where the client/server must check for
> themselves if the id portion is prefixed?  Up to now, we have not used
> prefixes because they would be redundant, but it might make us more
> compatible with other systems if we do.  Comments anyone?  I quite
> favour the removal of the prefix, but it makes no functional difference,
> so... I'm easy :-)
> 
> 
> > This is open for discussion, (whether a separate NASC and ABRC code 
> > namespace, or integrate them somehow?).
> 
> since separate namespaces is the reality (regardless of the underlying data 
> being identical) we 
> should make separate namespaces.  A similar phenomenon happens between
> Genbank and EMBL - identical records with different identifiers in each
> namespace.  But since a service may only recognize one or the other namespace
> we must keep them separate and "join" them as cross-references.




More information about the moby-l mailing list