[Biopython-dev] pypaml

Fri Jun 10 15:59:44 UTC 2011

On Fri, Jun 10, 2011 at 6:53 AM, Brandon Invergo <b.invergo at gmail.com>wrote:

> There is one other problem. As you may recall, I decided to
> reimplement the Chi2 program from the PAML package to provide a
> convenient means to do likelihood ratio testing without having to load
> another library (scipy, rpy2). The original was written in C but had
> limited command-line options so I couldn't just write an interface to
> it. Re-writing the code in Python seemed to work fine, as far as
> getting the correct results/output. However, I later found that doing
> tests with large degrees of freedom (one codeml model comparison
> requires 41 df) takes an exorbitant amount of time compared to the C
> code. So, I see three options: dig into the code to try to find ways
> to optimize it, look into something like Weave for compiling the C
> code into a Python module, or just remove Chi2 for now and wait for
> him to release a version that takes command line arguments (which he
> claims is coming in the next version). Any thoughts on this matter?
>

If you've already ported the code to pure Python or Python+Numpy/Scipy, do
you think it would make sense to provide this function under
Bio.Phylo._utils instead of in your PAML module? Then users would be able to
do a likelihood ratio test on trees without having the PAML binaries
installed.

The pure-Python version would still be handy for smaller degrees of freedom,
and if someone happens to be using PyPy it would probably be wicked fast.
The best solution is probably Numpy, rather than Scipy, since other parts of
Biopython already use Numpy as an optional dependency.

(Right now, Bio.Phylo runs on Python 3, Jython, and Pypy, so adding and
supporting a hand-written C extension on all of these platforms is probably
not worth the trouble.)

Thanks,
Eric