[Bioperl-l] IUPAC support for DNA alignment
Alexie Papanicolaou
apapanicolaou at ice.mpg.de
Wed Jul 2 20:21:05 UTC 2008
>>
>> Full score version:
>> 1) T - U = +3 (I assume U is the same as T for alignment purpose,
>> right?)
>
> Right.
>
yea... unless you have wobble pairing
<http://en.wikipedia.org/wiki/Wobble_base_pair>UG :-) let's keep it
simple though...
>>
>> 2) A - W = +3
>> 3) A - D = +3
>> 4) A - N = +3
>> 5) A - X = -1 (not so sure about this one)
>>
>> Probabilistic score version:
>> 1) T - U = +3
>> 2) A - W = +3/2-1/2 = +1
>> 3) A - D = +3/3-1*2/3 = +1/3
>> 4) A - N = +3/4-1*3/4 = 0
>> 5) A - X = -1
>
> Note that there are also M, R, V, and H, and their complements (which
> by definition would not match your example of 'A').
>
oh, I assumed Yee Man was just giving us a trimmed down example. Hilmar
is very right.
> Note also that the above implicitly assumes 50% GC content or equal
> likelihood of the code-constituent bases, which in reality for most
> coding sequences is not true.Also, if you have a known polymorphism at
> the site, for 3-letter ambiguities not all 3 may be equally likely.
> For example, if you have letter D for a [A/G] SNP, one may not want to
> give 1/3 of weight to possibility T.
> I would at least allow for the possibility to assign expected base
> frequencies and weight the ambiguous possibilities by those.
> -hilmar
Ehm, wouldn't we now be walking in the twilight of modeling it?
That might be a bit harder work for Yee Man, perhaps Yee Man can
document how the user can provide their own substitution matrix?
--
"You can't find a hermit to teach you herming, because of course that rather spoils the whole thing."
-- (Terry Pratchett, Small Gods)
Alexie Papanicolaou
Department of Entomology,
Max Planck Institute for Chemical Ecology,
Hans-Knoell-Strasse 8,
D-07745 Jena, Germany.
More information about the Bioperl-l
mailing list