[Bioperl-l] IUPAC support for DNA alignment

Alexie Papanicolaou apapanicolaou at ice.mpg.de
Wed Jul 2 20:21:05 UTC 2008


>>
>> Full score version:
>> 1) T - U = +3 (I assume U is the same as T for alignment purpose, 
>> right?)
>
> Right.
>
yea... unless you have wobble pairing 
<http://en.wikipedia.org/wiki/Wobble_base_pair>UG :-)  let's keep it 
simple though...
>>
>> 2) A - W = +3
>> 3) A - D = +3
>> 4) A - N = +3
>> 5) A - X = -1 (not so sure about this one)
>>
>> Probabilistic score version:
>> 1) T - U = +3
>> 2) A - W = +3/2-1/2 = +1
>> 3) A - D = +3/3-1*2/3 = +1/3
>> 4) A - N = +3/4-1*3/4 = 0
>> 5) A - X = -1
>
> Note that there are also M, R, V, and H, and their complements (which 
> by definition would not match your example of 'A').
>
oh, I assumed Yee Man was just giving us a trimmed down example. Hilmar 
is very right.
> Note also that the above implicitly assumes 50% GC content or equal 
> likelihood of the code-constituent bases, which in reality for most 
> coding sequences is not true.Also, if you have a known polymorphism at 
> the site, for 3-letter ambiguities not all 3 may be equally likely. 
> For example, if you have letter D for a [A/G] SNP, one may not want to 
> give 1/3 of weight to possibility T.
> I would at least allow for the possibility to assign expected base 
> frequencies and weight the ambiguous possibilities by those.
>     -hilmar
Ehm, wouldn't we now be walking in the twilight of modeling it?
That might be a bit harder work for Yee Man, perhaps Yee Man can 
document how the user can provide their own substitution matrix?


-- 
"You can't find a hermit to teach you herming, because of course that rather spoils the whole thing."

    -- (Terry Pratchett, Small Gods)

Alexie Papanicolaou
Department of Entomology,
Max Planck Institute for Chemical Ecology,
Hans-Knoell-Strasse 8,
D-07745 Jena, Germany.





More information about the Bioperl-l mailing list