[Biopython-dev] pypaml

Sat Jan 15 11:56:23 UTC 2011

On Sat, Jan 15, 2011 at 5:35 AM, Eric Talevich <eric.talevich at gmail.com> wrote:
> Hi Brandon,
>
> Thanks for volunteering! I think this will be a nice addition to Biopython
> and particularly Bio.Phylo.
>
> Some thoughts on organization:
>
> On Fri, Jan 14, 2011 at 10:40 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
>>
>> The functionality here looks great. My stylistic suggestion would be
>> to separate the code for running the commandline from that used to
>> parse the output file. Ideally these would be two separate classes
>> that could live under the Bio.Phylo namespace:
>>
>> https://github.com/biopython/biopython/tree/master/Bio/Phylo
>>
>
> I agree.

That sounds good. This will be a big change for anyone already
using the stand alone pypaml - but some changes are unavoidable.

> For the commandline code, it would be nice to have a
>> Bio.Phylo.Applications that is organized similar to
>> Bio.Align.Applications:
>>
>> https://github.com/biopython/biopython/tree/master/Bio/Align/Applications
>>
>> This will give you some flexibility as you want to expand out to
>> support other programs, and provide a framework for additional
>> phylogenetic commandline utilities.
>>
>
> Since it sounds like you might eventually write wrappers for other programs
> in the PAML suite, a layout like this might work:
>
> Bio/Phylo/Applications/_codeml.py
>  -- just the wrapper for running the command-line program, perhaps based on
> the Bio.Application classes. The API for calling the wrapper goes through
> __init__.py; the user doesn't import this module directly. (See
> Bio.Align.Applications)
>

Roughly how many applications are there in PAML? What Brad and
Eric have outlined would work fine, but we could opt for something
a little different, like the namespace Bio.Phylo.Applications for
general tools (there are some tree building tools I could write
wrappers for - using the same setup as Bio.Align.Applications),
and have namespace Bio.Phylo.Applications.PAML for the PAML
wrappers. Another reason to separate them is they won't be
using the simple Bio.Application framework (due to the way
PAML options must be specified via input files).

>
> Bio/Phylo/PAML/codeml.py
>  -- all the code for parsing the output of the command-line program, and
> working with that dictionary/class. Any other modules this depends on would
> also go here, as would the other code for working with the input/output of
> other PAML programs.
>
>
>> Separating parsing from commandline generation can also let you move
>> the _results dictionary from being a class member to a return value for
>> a parse function. This is a bit more straightforward workflow
>> instead of having the side-effect of assigning an internal class
>> attribute.
>>
>
> Yes. Also, the user might have saved the output from a codeml run
> previously (maybe from a shell script/pipeline), and want to parse it
> without re-running codeml through a Python wrapper. Right? (Sorry
> if I misunderstood your code.)
>
> I look forward to seeing your branch on GitHub. Please let us know
> if you have any problems along the way.
>
> All the best,
> Eric

Thanks for your comments Brad and Eric :)

Peter