[Biopython] PWM using gapped alignments
Chris Gowen
gowencm at vcu.edu
Thu Jul 28 16:28:40 UTC 2011
Hello all,
We are trying to perform pwm calculations using the Motif.pwm() function,
and many of our alignments have gaps, which raise KeyError when it tries the
key '-'. I am fairly inexperienced with this analysis technique, but from
looking at the source, it seems the error itself may be avoided by adding a
line before line 97 to skip that letter in the calculation. Would this mess
up the calculation for the pwm scores? Has anyone dealt with this problem in
a more clever way?
Thanks for any advise you can offer.
Best,
Chris Gowen
82 - <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#> def
pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif.Motif-class.html#pwm>
(self,laplace=True):
83 """ 84 returns the PWM computed for the set of instances 85 86 if
laplace=True (default), pseudocounts equal to self.background multiplied by
self.beta are added to all positions. 87 """ 88 89 if self.
_pwm_is_current: 90 return
self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
91 #we
need to compute new pwm 92
self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
= [] 93 for i<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
in xrange <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>(
self.length<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
): 94 dict = {} 95 #filling the dict with 0's 96 for letter in self.
alphabet <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>.
letters <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>: 97
if laplace: 98 dict[letter]=self.beta*self.background[letter] 99 else: 100
dict[letter]=0.0 101 if self.has_counts: 102 #taking the raw counts 103 for
letter in self.alphabet<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
.letters <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>:
104 dict[letter]+=self.counts[letter][i<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
] 105 elif self.has_instances: 106 #counting the occurences of letters in
instances 107 for
seq<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
in self.instances: 108 #dict[seq[i]]=dict[seq[i]]+1 109
dict[seq<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
[i <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>]]+=1
110 self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
.append <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>(
FreqTable <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>.
FreqTable <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>(
dict,FreqTable<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
.COUNT <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>,
self.alphabet<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
)) 111 self._pwm_is_current=1 112 return
self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
More information about the Biopython
mailing list