[Dynamite] SingleModel

Ewan Birney birney@ebi.ac.uk
Mon, 6 Mar 2000 04:48:38 +0000 (GMT)


> 
> 	interface SingleTransitionParameters {
> 	  float                  transition_probability;
> 	  Alphabet::WeightVector emission_probability;

Grrrrr. <minor> Can't we call this Alphabet::ProbabilityVector? What's
wrong with sensible names?
 
> 	}
> 
> 	interface SingleModelParameters {
> 	  SingleTransitionParameters           get_parameters (in Transition t);
> 	  sequence<SingleTransitionParameters> all_parameters();
> 	// possibly also:
> 	//  sequence<SingleTransitionParameters> outgoing_parameters (in State s);
> 	}

Ok. I think I see this. Basically I like this.

> 
> The easiest way to keep the parameters & the model in sync is to stipulate
> that the sequence<SingleTransitionParameters> returned by
> SingleModelParameters::all_parameters() is indexed by the same index as
> the sequence<Transition> returned by the Model::all_transitions() method.
> Ditto the outgoing_parameters() method -- if you see what I mean.
> 
> I regard this as perfectly valid coding practise as long as it's WELL
> documented. We could even incoporate a sanity check, by having a
> "Transition* my_transition" field in SingleTransitionParameters.
> 

More worried about growing/shrinking the model. Perhaps that is a later
thing to think about.

> The next option for keeping the model & parameters in sync is to use the
> get_parameters(Transition) method in SingleModelParameters. If we don't
> use a ParameterisedModelMemento pattern (as I suggested above), then we
> care quite a lot about how fast these lookups are. We would have several
> implementation options:


I don't see this ParameterisedModelMemento pattern ... is this the 
parallel arrays you are suggesting above?

> 
> 	(0) Linear array of Transitions, searchable by brute force in
>             time O(M) where M is the model size
> 	(1) Sorted array of Transitions, searchable by binary chop in time
>             O(log M)
> 	(2) Large M*M array (so, O(M^2) memory)
> 	(3) Some kind of hashing on Transitions
> 
> This may seem like a lot of effort (maybe this is what you meant by mad
> gymnastics -- I've got used to the STL doing all these algorithms for
> you!).
> 

Gymnastics not this but the parallel array stuff. Ok as long as it is
documented. Sort of gives me the heebie-jeebies though.

> However, I really think the parameters are a separate thing from the
> model. I want to convince you.... Think about training, think about
> parameterising HMMs & GeneWise; even think about models as simple as Smith
> Waterman where the number of parameters is less than the number of
> transitions... think about Fisher kernels (where we will also want to have
> a parameter-like datastructure)... this is a somewhat intuitive feeling
> on my part, so I don't want to hammer it in without a consensus.
> 

I have the same gut feeling. So - I think this is good...


> OK, the low-level-algorithmic rationale may have been wrong; but the point
> is I don't think states have any useful internal data (except their name),
> and the only thing you need to be able to DO with them is to test whether
> State s1 == State s2.
> 

If the parameters are somewhere else, then yes.

But will we want to inheriet off them for things like draw-able models.
(position, colour etc). I don't see what we loose by making them
objects...

> > force of habit: Most (all?) IDL compilers bitch about using sequence<X>
> > outside of a typedef. Annoying eh?
> 
> yes indeed... oh well.
> Perhaps call them XSequence to be consistent?
> 

Name clash

	Sequence -> biological polymer
	Sequence -> sequence in IDL

We can't call them XSequence. It will confuse the fuck out of joe
bioinformatics guy. Has to be List or... Vector or something...


> Ian
>