[Hackathon] [MOBY-l] ideas from the CRIB

Mark Wilkinson markw at illuminae.com
Fri Feb 28 16:16:34 UTC 2003


On Thu, 2003-02-27 at 13:25, Heiko Schoof wrote:

> Mark is now trying to remind me to do things in posts to lists where I 
> am not even subscribed, maybe it's time to resurface.

oops!  I forgot which lists you were on and which you were not.  This
message is appropriately being sent to the main list.  The -dev list was
supposed to be for weekly meeting announcements and other administrative
stuff, but has already become a second, non-redundant discussion list
:-/   I have my share of blame for that too...


> But at that point I really think we need to tackle the problem of 
> identity, equality, or relatedness.

I agree with this, though I would have defined them somewhat differently
(showing already what a can of worms this might become if we try to
implement it...)


> But then I could get very happy and excited by 
> writing stuff like:
> 
> SynonymousTo:EMBLSequenceAccession
> ReferencedBy:PubMedCitation
> TranslatesInto:ProteinSequence
> InheritsFrom:BasicCodingSequence
> AssignedTo:GOTerm
> Contains:SequenceMotif

I tend to agree with you - the CRIB is, in my mind, one of the two
Achilles heels of MOBY (the other being the wishy-washy human-readable
service descriptions) in that it is difficult to interpret what it is
you have been given.  I had derived an almost identical solution to the
one you present above, but never implemented it for several reasons, the
primary one being that every time I started creating a relationship type
ontology i got mired in confusion... the number of relationship types,
in principle, increases exponentially with the number of object types! 
ACK!  

I just dug up my own work on the problem from ~6 months ago... Perhaps a
pared-down version of this kind of idea can be built quickly; I would,
for example, advocate something similar to your "three pillars" of
relatedness, perhaps including catagories of:


- "IdenticalTo", "RelatedTo", and "FamiliarWith" - distinguish 
between synonyms (e.g. genBank and EMBL records on the same sequence),
related datatypes (e.g. homologues in other species), and 'families' of
data (e.g. a transcript is FamiliarWith its primary sequence and is
FamiliarWith its individual exons).

- "PrimaryReference" and "SecondaryReference" - distinguish between
references that speak directly to the issue of the data in-hand, versus
those that speak to related issues (e.g. I have a liver cancer gene that
is a tyrosine kinase.  PrimaryReference would contains references that
talk about the gene I have in hand, SecondaryReference would contain
references to other papers on tyrosine kinases and/or other liver cancer
genes)

- AnnotatedWith - to assign annotations of any type to any type of data.

That's about as far as I got with my thinking about this issue before I
abandoned it.  I think the 6 relationship types above are sufficiently
generic that they can be used in contexts other than Sequence objects
e.g. IdenticalTo could be used to describe identical organisms in two
different Taxon nomenclatures, or an enzyme name and its EC identifier. 
There may be a few other relationship types needed, but I don't think
there needs to be more than a dozen if we keep them sufficiently
generic.

Again, I don't know if this is sufficiently well thought-out at this
point to include it in the upcoming spec, but if there is a groundswell
of enthusiasm and agreement on something akin to what Heiko and I are
describing, or if we feel that it is critical to have *something* in
place in this regard, we can probably slip it in at the last minute... 
similarly, if there are any strong objections to such an idea, say so
now, before it is too late!


> What's up with the Brisbane/ISMB plan? I may be there, will I meet you 
> guys?

I'm not certain that i will be there...  I'm primarily using my travel
money for more directly MOBY-related meetings.

Let us know what you (all) think, 

Cheers Heiko!  thanks especially for all of the leg-work you have been
doing promoting MOBY in Europe - I am getting a lot of feedback from 
European labs who have heard about the project through you :-)

Mark


-- 
=======================================================================
                                    |--==\
Mark Wilkinson, Ph.D.                \==-|       
Bioinformatics Consultant             \=/        0010010010100101110010
Illuminae Media                       /-\        
727 6th Ave. N.                      /-==|       0010100100111101010010
Saskatoon, SK, Canada               |==-/        
S7K 2S8                              \=/         0100100100010010010101
+1 (306) 373 3841                     /\         
markw at illuminae.com                  /=-\        1101001010100101010101
                                    |--==\
=======================================================================



More information about the moby-l mailing list