[Biopython-dev] pypaml

Brad Chapman chapmanb at 50mail.com
Fri Jan 14 15:40:35 UTC 2011


Brandon;
It's great you are looking to contribute your CODEML wrappers to
Biopython. It looks like really useful functionality. Peter tackled
most of the high level details so I'll chime in with a few more
detailed suggestions.

> I use the subprocess library of python to call the command line tool.
> PAML programs work by calling the tool with a control file as its
> argument. The control file specifies all of the run arguments,
> including the data files, output files, and other variables.
> Basically, pypaml works by dynamically building a control file via
> properties for the data files and a dictionary for the other
> variables, running the command line tool with that control file as its
> parameter, and then grabbing the output file, parsing it and storing
> the results in a dictionary object.
> 
> The run() function, line 217, does this:
> http://code.google.com/p/pypaml/source/browse/trunk/src/pypaml/codeml.py
> with the actual subprocess call happening at 239/241 (verbose/silent).

The functionality here looks great. My stylistic suggestion would be
to separate the code for running the commandline from that used to
parse the output file. Ideally these would be two separate classes
that could live under the Bio.Phylo namespace:

https://github.com/biopython/biopython/tree/master/Bio/Phylo

For the commandline code, it would be nice to have a
Bio.Phylo.Applications that is organized similar to
Bio.Align.Applications:

https://github.com/biopython/biopython/tree/master/Bio/Align/Applications

This will give you some flexibility as you want to expand out to
support other programs, and provide a framework for additional
phylogenetic commandline utilities.

Eric might have some suggestions about the best module name to use
for the parsing code as he has been managing the Phylo namespace.

Separating parsing from commandline generation can also let you move
the _results dictionary from being a class member to a return value for
a parse function. This is a bit more straightforward workflow
instead of having the side-effect of assigning an internal class
attribute.

Thanks again for contributing,
Brad




More information about the Biopython-dev mailing list