[Biopython-dev] [Biopython] Bio.motifs raising Exceptions using pypy

Peter Cock p.j.a.cock at googlemail.com
Fri Jul 12 12:57:08 UTC 2013


On Fri, Jul 12, 2013 at 11:48 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> OK - this also breaks under Jython and even Python if we
> disable the C extension. Here self[letters] only has ACGT,
> not N, thus a key error. This is something the C code just
> ignores. There is also an inconsistency with mixed case.
>
> New unit test:
> https://github.com/biopython/biopython/commit/e13c97ae3535b58d8ec3da3fc565e97db1fa75a3
>
> Fix for the mixed case difference:
> https://github.com/biopython/biopython/commit/0cab00c66a1fd15072d020cfc17edbdfb37484a5
>
> The KeyError from bad characters can be handled like this:
>
> $ git diff
> diff --git a/Bio/motifs/matrix.py b/Bio/motifs/matrix.py
> index bce1d4f..e6446b5 100644
> --- a/Bio/motifs/matrix.py
> +++ b/Bio/motifs/matrix.py
> @@ -364,7 +364,11 @@ class PositionSpecificScoringMatrix(GenericPositionMatrix):
>                  score = 0.0
>                  for position in xrange(m):
>                      letter = sequence[i+position]
> -                    score += self[letter][position]
> +                    try:
> +                        score += self[letter][position]
> +                    except KeyError:
> +                        #The C code ignores unexpected letters like N
> +                        pass
>                  scores.append(score)
>          else:
>              # get the log-odds matrix into a proper shape
>
> However, that leaves a numerical difference in the output:
>
> ...
>
> The same error occurs on Jython, and on Python if I disable
> the C extension. This needs a little more investigation... I
> don't immediately follow when the C code sets the value
> to nan.

Rereading the C code after lunch I realised how the 'ok' sentinel
value was being used - bad letters result in NaN as the value.

Fixed,
https://github.com/biopython/biopython/commit/00043d28bdf5408519cb4832d6a8e822d10f6653

Peter



More information about the Biopython-dev mailing list