[Biopython-dev] Code review request for phyloxml branch

Eric Talevich eric.talevich at gmail.com
Fri Jan 8 18:22:11 UTC 2010


On Fri, Jan 8, 2010 at 12:00 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> On Fri, Jan 8, 2010 at 4:26 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> > I am not an expert in this area, but the code looks very well done and well
> > organized. Thanks, Eric!
> >
> > I have one suggestion though:
> > In the current layout, there's a Bio.Tree and a Bio.TreeIO module. I'd rather
> > have everything under Bio.Tree. This makes it easier to understand what each
> > Bio.* module is about, and also agrees with the structure of the other modules
> > in Biopython. The only exception is Bio.Seq, for which there is a closely related
> > Bio.SeqIO and Bio.SeqRecord. (In my opinion, that is more for historical reasons;
> > I'd rather have a single Bio.Seq there too).
>
> There is also Bio.AlignIO, which again might have been handled via Bio.Align
> with hindsight. One reason for this choice of naming (SeqIO and AlignIO) was
> following the lead from BioPerl.

Yep, BioPerl has a TreeIO module, too. BioRuby and BioJava do
something completely different.

I had the impression that pairing modules Foo & FooIO was an emerging
convention for organizing very general data types being fed by a
variety of file formats, while a single module Foo indicated support
for a particular program or source, like Entrez. But I think it would
be even cleaner if each Foo simply had a Foo.IO (or foo.io) sub-module
organizing the I/O for multiple file formats where applicable.

The TreeIO.* namespace is not crowded -- just read, write, parse,
convert. If that directory is moved under Bio.Tree and renamed to IO
or io, then Bio.Tree would still seem reasonably intuitive if
__init__.py contained:

from io import *
from utils import *

Then "from Bio import Tree" would be enough for most uses.

> I think there are some good points about making
> the code for the common object (tree, SeqRecord, Alignment) clearly separate
> from the code for parsing or writing it (although separate top level modules is
> perhaps overkill). However, I agree, this isn't universal in Biopython (e.g.
> Bio.Motif handles a range of motif file formats but there is no Bio.MotifIO).

PDB does its own thing, too -- and some consolidation there might be nice.

> So I'm somewhat on the fence about the Bio.TreeIO name. However, one thing
> I don't like is that "Tree" could mean a class or a module (also a problem with
> other Biopython bits like "Seq", "SeqRecord", "Nexus"). Current Python
> convention (PEP8) is to use lower case for the module ("tree") and title case
> for the class ("Tree"), something most of Biopython does not follow (and
> which we can't change without a lot of upheaval).

I could rename the modules inside Bio.Tree (or whatever we call it) to
follow the PEP8 convention:

Bio/Tree/
Bio/Tree/basetree.py
Bio/Tree/io.py
Bio/Tree/utils.py ...

The Biopython convention seems to be that directory names are title
case, file names are mostly title case if user-facing and lower case
otherwise, and C extensions are lower case. Most of the time there
won't be any need to import the sub-modules under Tree directly, so
the inconsistency shouldn't be too jarring.

> perhaps something different like Bio.Phylo instead?

Sure, that sounds promising.


Thanks!
Eric



More information about the Biopython-dev mailing list