[MOBY-l] Hmmm... ID objects or SpecificID objects??

Jason E. Stewart jason at openinformatics.com
Fri May 31 19:17:27 UTC 2002


"Mark Wilkinson" <mwilkinson at gene.pbi.nrc.ca> writes:

> Many services (eg. sequence retrieval services) will simply take an
> ID number as input.  The problem is that ID numbers may be of many
> types...  GenbankGI, GenbankAcc, EMBLID, TIGR_Gene_ID, and so on and
> so on and so on.  In principle, we could define an object like this:
> 
>         <ID  namespace="GenbankGI" id="1223647"/>

Let me be uncharacteristically soft-spoken about this:

  No!  No!  No!  No!  No!  No!  No!  No!

This is not and ID object, this is a Sequence object that only has
it's ID attribute set. This is really important. So it would be:

        <Seq  namespace="GenbankGI" id="1223647"/>

The Object Hierarchy has as its root an object called 'Object' (or it
could be called 'MOBY' or 'ID' just as well). That object provides the
two attributes "namespace" and "id" that all other objects
inherit. 'Seq' is the simplest object in the sequence hierarchy, it
provides no new attributes over those that it gets from 'Object' but
it is clearly identifiable as a sequence because of its datatype.

> But since services register only the type of *object* that they deal
> with, not the namespace that they accept, most services that claim to
> accept ID numbers will not necessarily handle *all* types of ID's.
> 
> The alternative is to have separate objects for each type of ID:
> 
>         <GenbankGI  namespace="GenbankGI"  id="1223647"/>
> 
> But this seems like a nightmare scenario...or?

Yes, it does seem nightmarish. I think that we should consider that
services operate on the data type and the namespace - otherwise we
will have combinatorial avalanche of data-types. 

They really are two separate issues, what the datatype is, and who
named the data. That way a service could be agnostic who named the
data if it wanted to be, so if two naming authorities named the same
sequence by two different identifiers:

        <ComplexSeq  namespace="GenbankGI" id="1223647">
          AACGTCCAAA...
        </ComplexSeq>

        <ComplexSeq  namespace="EMBL" id="7463221"/>
          AACGTCCAAA...
        </ComplexSeq>

Some applications could accept either because the actual nucleotide
sequence is present (that would mean defining the service on the
'ComplexSeq' type and not the subclass 'Seq' type which only has the
identifier). 

jas.



More information about the moby-l mailing list