[Biopython-dev] pypaml

Fri Jan 14 15:56:06 UTC 2011

Brad,

Thanks for your comments! This is something I hinted at in my original
post about structural rearrangements; what I meant is that I'm largely
unfamiliar with the structure/organization of the Biopython source and
how one works within it. I have admittedly not used Biopython much yet
in my own research so I don't have much experience with it. Your tips
on where pypaml would fit in the namespace are very helpful.

Already from thinking about integrating with the project I had thought
about separating the parsing engine, especially once I have to start
doing version checking. The code may grow quickly for that and it does
make more organizational sense to move it.

Your comments also indirectly made me realize that eventually it would
be nice to be able to run the software on Bio.Align objects and
Bio.Phylo objects as inputs (they would have to be written to
temporary text files so that PAML could read them). Well, that'll come
in time, but it's a thought.

It looks like I have a lot of Biopython-related studying to do for
homework! Luckily, these things excite me...

Cheers,
Brandon

On Fri, Jan 14, 2011 at 4:40 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Brandon;
> It's great you are looking to contribute your CODEML wrappers to
> Biopython. It looks like really useful functionality. Peter tackled
> most of the high level details so I'll chime in with a few more
> detailed suggestions.
>
>> I use the subprocess library of python to call the command line tool.
>> PAML programs work by calling the tool with a control file as its
>> argument. The control file specifies all of the run arguments,
>> including the data files, output files, and other variables.
>> Basically, pypaml works by dynamically building a control file via
>> properties for the data files and a dictionary for the other
>> variables, running the command line tool with that control file as its
>> parameter, and then grabbing the output file, parsing it and storing
>> the results in a dictionary object.
>>
>> The run() function, line 217, does this:
>> http://code.google.com/p/pypaml/source/browse/trunk/src/pypaml/codeml.py
>> with the actual subprocess call happening at 239/241 (verbose/silent).
>
> The functionality here looks great. My stylistic suggestion would be
> to separate the code for running the commandline from that used to
> parse the output file. Ideally these would be two separate classes
> that could live under the Bio.Phylo namespace:
>
> https://github.com/biopython/biopython/tree/master/Bio/Phylo
>
> For the commandline code, it would be nice to have a
> Bio.Phylo.Applications that is organized similar to
> Bio.Align.Applications:
>
> https://github.com/biopython/biopython/tree/master/Bio/Align/Applications
>
> This will give you some flexibility as you want to expand out to
> support other programs, and provide a framework for additional
> phylogenetic commandline utilities.
>
> Eric might have some suggestions about the best module name to use
> for the parsing code as he has been managing the Phylo namespace.
>
> Separating parsing from commandline generation can also let you move
> the _results dictionary from being a class member to a return value for
> a parse function. This is a bit more straightforward workflow
> instead of having the side-effect of assigning an internal class
> attribute.
>
> Thanks again for contributing,
> Brad
>
>