[Dynamite] SingleModel

Ewan Birney birney@ebi.ac.uk
Mon, 6 Mar 2000 03:58:29 +0000 (GMT)


> Probably should have separate modules. I don't quite understand this
> module thing yet.

It really means "what code can I validly replace without having to
recompile the other code". It also means "which things can I put
into a separate CORBA server". Or - which things can I put into a library.

I suspect we'll have - 

	module for sequences/ and related things

	module for viterbi algorithms, alignments and other stuff
		(uses sequence module)

	module for training
		(uses sequence module) - NB - yet again the bugbear
of the alignment datastructure/non datastructure will come up. Argues
for putting this inside the viterbi module...

	module for database searching 


> 
> In fact.. now I think about it -- it seems fascistic to require that
> things have to be in the same module to share internal representation. Not
> to mention unworkable - surely any particular implementation of the
> Dynamite IDL can do all the internal code-sharing it wants, e.g. by just
> #including "super_generic_DP.h" for example?
> 

Not sure what you mean by this point...

> I'm not really sure what's going on here, whether we're talking about a
> coding style or a set of strict rules or what...
> 

The only thing is for us to be aware of what compile-time dependencies
there are in the code. Is not a "rule" issue more an awarness issue...

> > 
> > module SingleModel { // Single means emits only one sequence
> > 
> >   interface State;
> >   interface Transition;
> > 
> >   typedef sequence<float> ProbabilityEmission;
> 
> == Alphabet::WeightVector.
> No need to duplicate this.

ok.

> 
> > 
> >   interface Transition {
> >     State from;
> >     State to;
> >     float transition_probability;
> >     ProbabilityEmission emission; // emission on the transitions.
> 
> As I said before I think parameters should be in a separate object.
> If people don't like this then we could consider having a kind of memo
> object that represents a parameterised model.

How do we put the parameters elsewhere? Can you show me the IDL?
How do we keep the parameters in sync with the model? (without mad
gymnastics)

> 
> Possibly have a "boolean Transition::is_null" field for null transitions?

null meaning does not emit things?

> 
> What about "fanned" transitions (i.e. if it's an "A" then go to state 1,
> if it's a "G" then go to state 3, if it's a "C" go to state 4 etc)?
> These can be inefficiently implemented just by having A times as many
> Transitions (where A is the alphabet size) -- shall we just leave it at
> this for now? (I think we probably should.)

Lets leave this...

> 
> >   };
> > 
> >   typedef sequence<Transition> TransitionList;
> > 
> >   interface State {
> >     TransitionList all_Transitions();
> >   };
> 
> I think this method belongs in the model, not in an individual State.
> i.e. Model should have the following methods:
> 
> 	sequence<Transition> outgoing_transitions (in State s);
> 	sequence<Transition> all_transitions();
> 
> >   typedef sequence<State> StateList;
> 
> I also think "SingleModel::State" should be a typedef to int, not an
> interface of its own. States are lightweight things that are usually
> treated as ints anyway.

I don't understand here why we don't make this an object. We want to 
add/remove these things don't we? When we do the DP I imagine the first
thing we do is make a little internal datastructure to drive the generic
DP off. I don't think trying to make the objects look like what we
drive the low-level algorthmical code off will help us...

> 
> This breaks down if we need to add too much information to a State.
> However, with the parameters elsewhere, all we need is a name:
> 
> 	string state_name (in State s);
> 
> Incidentally (personal gripe) I dislike typedefs like the above one
> ("typedef sequence<State> StateList"). A sequence of States *IS* a "State
> List" by *definition*, there is no need to typedef it; it looks like we
> are generic-programming novices if we do. A typedef is acceptable (IMHO)
> when it denotes a specialised *kind* of list, e.g.
> 
> 	typedef sequence<State> AlignmentPath;
> 	typedef sequence<float> ProbabilityEmission;

force of habit: Most (all?) IDL compilers bitch about using sequence<X>
outside of a typedef. Annoying eh?

> 
> I guess if we're using sequence<IncrediblyLongAndHardToTypeClassName> a
> lot, then we might want to typedef it, but we should probably just choose
> a shorter name for the class in the first place.
> 
> > 
> >   interface Model {
> >     StateList all_States();
> >   };
> 
> To summarise the above edits:
> 
> 	interface Model {
> 	  sequence<State>      all_states();
> 	  sequence<Transition> outgoing_transitions (in State s);
> 	  sequence<Transition> all_transitions();
> 	  string               state_name (in State s);
> 	};
> 
> > 
> >   //
> >   // Have not done alignment yet
> >   //
> >   
> >   interface AlignmentFactory {
> >     attribute model Model;
> >     // also here a function pointer for compile-time function for this model
> >     Alignment make_alignment(in Seq seq);
> 
> I am not convinced that AlignmentFactory is a useful generalisation -- I
> feel that ViterbiAlgorithm makes more sense.

That is sensible.... 


> 
> Ian
> 
> > 
> >     // can throw exceptions/errors of bad alphabet, other things...
> >   };
> > 
> > 
> > }
> > 
> 
> 
> -- 
> Ian Holmes  ....  Howard Hughes Medical Institute  ....  ihh@fruitfly.org
> 
> 
> _______________________________________________
> Dynamite mailing list  -  Dynamite@bioperl.org
> http://www.bioperl.org/mailman/listinfo/dynamite
>