[MOBY-l] Moby class hierarchy

Sun May 5 08:08:23 UTC 2002

"Lukas Mueller" <mueller at acoma.stanford.edu> writes:

> On the trip home I was thinking about a class hierarchy for moby
> objects until the ibook batteries ran out. This could maybe be a
> basis for future discussion?

Hey Lukas,

Thanks for looking at this. 

> Since with have a class hierarchy, we don't need to specify the class
> in the quadruple anymore, so we're back to a triple. In the square
> brackets are the slots, attributes or whatever you want to call it, so
> that every object only has minimal information. The root element is
> Moby, and its children are indented etc. The Moby objects can be
> grouped ad hoc in Moby_lists. Sorry for the weird format but I'm not
> too familiar with XDL or whatever the right acronym is...

Format's not critical here, so pretty much anything would work.

I'm nervous about getting too detailed with the data types (or the
service types). I think that we should really look at how we want to
specify the *services* themselves. I think we learned a good lesson
scoping out the one service we finished:

RetrieveAnnotationsFromGOId {
  $GO_object
} returns @sequence_object;

MOBY_object {
  $identifier, 
  $namespace,  
}

GO_object { inherits from MOBY_object
  # no new attributes
}

sequence_object { inherits from MOBY_object
  # no new attributes
}

Neither of these objects <sequence> or <GO> needed any new attributes
above the two provided by <MOBY>. That way when requesting a
RetrieveAnnotationsFromGOId service, the client would know to only
send the $identifier and $namespace for any GO objects - if we had
fully specified all possible attributes for GO objects in the base
class, the client wouldn't know which were optional and which were
required and it would send everything up the wire. 

By having a minimal base class we make efficient use of our bandwidth,
but the big question is whether this will lead to an explosion of
classes. 

> Moby [triple]
> 	VirtualSequence [length]	# does not contain the
> 	complete sequence, only length
> 
> 		Sequence [sequence]

I actually think we want to have <sequence> as the base class, and not
give it *any* attributes, so that we can just pass around the minimal
identifier/namespace pairs without needing a length attribute.

> 		DNA []
> 			Chromosome [chromosomeNumber]
> 			Assembly
> 				BAC etc.
> 			SequenceFragment 	[seq]
> 			Polymorphism	[type {snp, indel, deletion,
> 			insertion, reversion} start, end,
> 			polymorphicSeq]
> 
> 			Locus [GeneModel_List]
> 				
> 			GeneModel [Sequence, orientation, Chromosome,
> 			start, end, SequenceFeatureAnnotation_List]
> 
> 				
> 			Repeat []
> 			Domain []
> 			Transposon []

Last year, I got a useful piece of advice from Scott Markle when
working on the MAGE model. He suggested that we not create a new class
or subclass unless it had new behavior (methods) or attributes. In our
case methods would mean services that would treat the different data
types differently. 

My advice is to start listing services we want to see, and looking at
what data types we'd need for input parameters, and then this
hierarchy will fall out.

Here's a stab at the MAGE related stuff:

# this retrieves all public experiment sets from the server
# it only returns the minimum high-level info about each set
# so it is not a resource hog
RetrieveExperimentSets {
  # no arguments
} returns $MAGE_object

# this is the opaque MAGE object, it needs to be this way
# because a single MAGE-ML file can represent many different things:
# an ArrayDesign, an ExperimentSet, or a BioMaterial
# in the future it might be wise for MAGE to have different files
# for each type, but right now we don't
MAGE_object {
  # no attributes beyond those in <MOBY>
}

# this is a 'complete' MAGE object with an XML payload
MAGE_complete_object {
  $mage_ml_string # the XML payload for a MAGE-ML file
}

# This retrieves information about the experiment sets we don't need a
# <MAGE_complete> object because the identifier and namespace
# attributes of the simple <MAGE> objects are all we need
RetrieveExperimentSetsByIdentifier {
  @MAGE_objects
} returns $MAGE_object

# this returns a list of the taxa for which there is public experiment
# sets available
RetrieveExperimentSetSpecies {
  # no arguments
} returns $MAGE_object

# retrieves the BioAssays for the indicated experiment sets
RetrieveBioAssaysByExperimentSetId {
  @MAGE_objects
} returns $MAGE_object

# retrieves the actual numerical expression data for the indicated
# BioAssay's
RetrieveDataByBioAssayId {
  @MAGE_objects
} returns $MAGE_object

# submission an ArrayDesign to a public DB
SubmitMAGEArrayDesign {
  $MAGE_object
} returns $MOBY_return_code

# submission an ExperimentSet to a public DB
SubmitMAGEExperimentSet {
  $MAGE_object
} returns $MOBY_return_code

I really have no idea what a <MOBY_return_code> object looks like.

Any feedback?
jas.