[Biopython-dev] Biopython tutorial: Substitution Matrices

Peter Cock p.j.a.cock at googlemail.com
Thu Sep 27 13:55:21 UTC 2012


On Fri, Sep 21, 2012 at 5:57 PM, Hui Ting Grace Yeo <yhtgrace at gmail.com> wrote:
> Hey everyone,
>
> I'm working on this bug here https://redmine.open-bio.org/issues/3340
> and I've updated the example in the tutorial (on substitution matrices,
> 17.4.2) using Bio.AlignIO on github here
> https://github.com/yhtgrace/biopython/tree/clustalw-alignIO-replace.
> I'm able to reproduce the dictionary replace_info, but when I go on to
> finish the example, I get the following log odds matrix:
>
> D   2
> E  -1   1
> H  -5  -4   3
> K -10  -5  -4   1
> R  -4  -8  -4  -2   2
>    D   E   H   K   R
>
> which is different from the one given in the tutorial. I'm wondering if I've
> missed something.

Hi Grace,

Using the current code and the example as it is, I also observe
the same result as you. According to github's "blame" feature
the current text dates back 4 years,

https://github.com/biopython/biopython/commit/bed3ab39d8a635f1e74be99e6730a48d2460f8b7

However, that was just a reformatting of an older example which
Brad wrote 11 years ago while converting the example from DNA
to protein:

https://github.com/biopython/biopython/commit/21df476c66b279824c51e6abd3f4ae549d003813

The example file itself protein.aln has not changed, committed:

https://github.com/biopython/biopython/commit/ccbe2d72014eafb064994bc3782ca5529d0b0448

See also Doc/examples/make_subsmat.py

So, since the example hasn't been changed in 11 years, this
suggests either Brad committed the wrong output (and no-one
noticed), or something changed in the calculation during that
time.

(Nowadays we try to use doctests for the examples in the
API and in the Tutorial where possible, so that code changes
which affect our examples are detected automatically.)

The most likely candidates would be something in the file
Bio/SubsMat/__init__.py

https://github.com/biopython/biopython/commits/master/Bio/SubsMat/__init__.py

A little detective work might be needed to explain this... sadly
trying to use Biopython from back then is complicated by the
reliance on the Martel/mxTextTools dependency.

Maybe Brad or Michiel has some insight?

--

In the meantime, I have applied your changes to the
example to use AlignIO,

https://github.com/biopython/biopython/commit/19f9317fe0e346f6c3f197d027076d9a1265def7
https://github.com/biopython/biopython/commit/5949f54dadb6d4ac8400e11d2afa33db549afba5

This will now get tested via test_Tutorial.py automatically
(except for the final line about printing the odds matrix):

https://github.com/biopython/biopython/commit/15dd6ba17eb092d0d7df674ac45617d99256d098

Thank you,

Peter



More information about the Biopython-dev mailing list