[Biojava-dev] Comments about OrderNDistributions

Matthew Pocock matthew_pocock at yahoo.co.uk
Tue Mar 4 17:23:11 EST 2003


Hi Francois,

You can have a distribution over codons of the form
P(ctg) by just using a normal probability distribution
over DNA x DNA x DNA - use the normal distribution
factory and it will just work.

If you use this distribution in an HMM, you must make
sure that you always look at non-overlapping codons,
or the probabilities won't add up and your model won't
be valid (and possibly even won't train).

Matthew

 --- Francois Pepin <fpepin at cs.mcgill.ca> wrote: >
After going through the code for the
> OrderNDistributions, there are a
> couple of comments and questions that I would have:
> 
> Is there any reason why the conditional
> probabilities instead of joint
> probabilities are used there?
> 
> Right now, for OrderNDistribution.getWeight(cgt) (or
> any codon) gives
> P(t|cg) while getting P(cgt) would be a lot more
> useful. It's quite easy
> to go from the joint to the conditional
> probabilities while getting the
> opposite information is pretty troublesome.
> 
> To get P(cgt), one would need to get P(t|cg)*sum of
> P(g|nc)*sum of
> P(c|nn). (sum of
> P(g|nc)=P(g|ac)+P(g|cc)+P(g|gc)+P(g|tc) ).
> 
> I don't really see why not store it as joint
> probabilities and not have
> to worry about the conditioning and conditioned
> alphabets there.
> 
> Also, I'm not positive about this, but it seems that
> some information
> would be lost (or at least quite difficult to
> recover) about the first
> few characters of the distribution, for example with
> AACCCGGG, it there
> are no A's that would appear anywhere in a 3rd order
> (which would really
> be a 2nd order Markov chain) distributions. Two ways
> of going around it
> would be to carry all of the distributions of lower
> order to make sure
> that you have the data around, but it's a bit
> cumbersome, or to have the
> SymbolListViews.orderNSymbolList(AACCCGGG, 3) give
> out NNAACCCGGG in
> this case, and have the orderNDistributions keep
> that into account.
> 
> What do people think about this?
> 
> Francois Pepin
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev 

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com


More information about the biojava-dev mailing list