[Biopython-dev] pypaml

Brandon Invergo b.invergo at gmail.com
Tue Feb 22 16:40:01 UTC 2011


Hi everyone,
I've been toiling away on the PAML API and I think it's finally ready
for review. If anyone's willing to give my code a review, here's my
branch:
https://github.com/brandoninvergo/biopython/tree/paml-branch
(the API is in Bio/Phylo/PAML, as suggested before, and the tests are
in Tests, with their supporting files in Tests/PAML)
I'll also post a message to the Biopython user list to see if anyone
would be willing to give it a test drive.

Some notes:
- I've implemented Codeml, Baseml/Basemlg and Yn00. I have not yet
done anything with Mcmctree because I am completely ignorant about
what information to extract from the output files. The other two
programs in the package, Evolver and Chi2, do not accept commandline
options and are instead operated by a rudimentary commandline
interface, so they aren't really compatible with scripting.

- Chi2 is useful, though, because it provides a chi^2 CDF, which you
can use in performing maximum likelihood ratio tests, an important
part of using the PAML programs. Since Python doesn't have a chi^2
cumulative distribution function in its standard library, I ported the
original C code rather than writing a function which simply calls the
original, with the permission of Ziheng Yang (the original author;
this is mentioned in the code's comments, but he required no other
licensing/copyright verbage to be included). This was no easy task,
considering the C code was littered with goto statements. Anyway, this
will prevent the user from having to install/import an outside package
to do the tests (I personally had been using Rpy2 to call the R
function pchisq()....complete overkill). Let me know if this is ok or
if this causes some kind of conflict

- The output of the programs varies widely with the combinatorics of
the parameters and possibly between versions. I tried to include all
possible output files in the Tests/PAML directory and I wrote test
cases to check that they're properly parsed (with the testing of
future versions in mind). So, that Tests/PAML folder has a lot more in
it than the usual test folders, but I felt there was no other option.
I tried to make it organized.

I think those are the main points for now. I'd assume that there's
more work to be done before I should perform a pull request, so I'll
simply ask for your comments for now if you have the time.

Cheers,
Brandon Invergo


On Sun, Jan 16, 2011 at 4:09 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Sun, Jan 16, 2011 at 2:19 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> Hi everyone,
>> A quick question about style: since the name "codeml" is based on a
>> program which is always spelled either in all caps or in all
>> lower-case, what would be the best way to write the class name
>> regarding capitalization? Stick with the usual camel-case convention,
>> "Codeml", anyway?
>
> I'd go with Codeml for a class name (or something like
> CodemlResult or whatever). Neither CODEML nor codeml
> seem good class names in Python.
>
>> Things are progressing nicely. I've already taken care of a lot of the
>> minor tasks and improvements...
>
> Sounds good :)
>
> Peter
>



More information about the Biopython-dev mailing list