[MOBY-l] Status of MOBY triples

Wed Feb 26 00:59:59 UTC 2003

Hi,

I'm new to all this, and have what is probably a basic question, but,
well, I can't find the answer anywhere.

As I look at BioMOBY messages, I'm somewhat confused as to the
"ontological status" of the triples.

So in the triples below, let me suppose that "163483" is an actual
genbank accession number (I looked in Genbank and found M80838
B.taurus prepreproelastase).

So the first example below, we could take to be a "reference" (like a
URN) to the thing which we know as "B.taurus prepreproelastase".

It doesn't contain any actual information itself -- just a pointer.

But the -second- example below, I don't know what to make of -- it
both contains a reference, as well as containing a sequence.

Does the "moby:id" field give a "type" to the sequence, or otherwise
constrain it?  Does the fact that this namespace/id resolves to
something in Genbank actually mean that the information contained in
the message is -guaranteed- to be something which was extracted by
some (somehow) faithful extraction mechanism from the original Genbank
record?

================================================================
<moby:Object  moby:namespace="GenBank/GI" moby:id="163483">

Sequence: note that Sequence inherits from (IS-A) Object
<moby:Sequence  moby:namespace="GenBank/GI" moby:id="163483">
    <moby:INT moby:namespace="primitive" moby:id="" moby:tagName="Length">375</moby:INT>
    <moby:STRING moby:namespace="primitive" moby:id="" moby:tagName="SequenceString">
        ATTGCGCATGCGAGCTAGTAGCATGCGATGAGGTCGATGCATCT
    </moby:STRING>
</moby:Sequence>
================================================================

In sum, what I'm asking is, what is the "status" of the information in
that second message?  Is it either guaranteed to be derived from the
Genbank record?  And if not, what purpoes does the namespace/id in the
start-tag serve?

Also, let us suppose that the body is guaranteed to be extracted from
Genbank.  Clearly, BioMOBY services will be called with data which is
no NO public repository.  How will such data be given types, and what
namespace/id will be used for each data-packet?  The same question can
be asked about the _outputs/results_ of BioMOBY service calls -- these
also can quite reasonably not be found in public databases.

In short, what I'm asking is, perhaps there is a notion of "typing"
for BioMOBY objects sent in requests and responses, which is
-different- from the notion of "naming" of publicly accessible
records.

Then, that notion of typing could be extended to a notion of perhaps
"faithful slicing of publicly accessible data" -- i.e. to provide the
reference to some publicly accessible object, but also some part (the
desired part) of that object in the request.

And of course, the notion of -naming- would be applicable to both
publicly accessible objects, and objects which were only accessible
from my laboratory workstation, perhaps just sequenced 5 minutes ago,
and being run thru BLAST for the first time.

Thanks,
--chet--