[Biopython-dev] Dependency policy; was PEP8 lower case module names?

Eric Talevich eric.talevich at gmail.com
Sun Nov 4 19:47:53 UTC 2012


On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Retitling thread
>
> On Sun, Nov 4, 2012 at 1:09 PM, Tiago Antão <tiagoantao at gmail.com> wrote:
> > Hi,
> >
> >
> > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> >>
> >> Already I feel that we need to install too many packages to get going
> with
> >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
> >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
> >> SciPy, Biopython). I find this hard to explain to people new to
> >> bioinformatics or new to Python. So I would prefer to keep one
> distribution.
> >>
> >> We can be more lenient in terms of dependencies, especially those that
> >> don't occur at compile time.
> >>
> >
> > One of the things that I always found lacking with biopython is a clear,
> > consistent policy on dependencies:
>
> It would be good to have something written down, just as we
> did with the deprecation policy.
>

Should we start a page for this on the wiki?



> > Depending on the mood of the day it could be either good/bad
> > to add a library dependency. As an example, this ended up
> > with there being a dependency on reportlab, but not on scipy.
>
> The ReportLab dependency is a 'run time only' dependency and
> has been in Biopython for a very long time. You'd have to remind
> me if there was any compile time issue with scipy, but my
> recollection was we were loath to add a dependency on scipy
> (which is quite a complex library to install if not using a package)
> for just one or two functions - however you were planning something
> more substantial in the PopGen code which would justify it (using
> lots of statistics).
>
> > Whatever the policy, I think that is should be consistent all across.
> > Preferably simple to both users and developers.
> >
> > A few ideas on policy:
> >
> > 1. I totally agree with the the idea of being as lenient as possible with
> > dependencies (as you say, especially with those that do not occur at
> > compile time).
> > 2. Biopython belongs to a certain software ecology. I think it would make
> > sense to see as natural adding dependencies on well established python
> > libraries.
> > 3. (1+2) If a developer wants to add a dependency on a package, that
> should
> > not be a major problem (as long as the package is maintained for
> long/well
> > known/stable). Users should only have to deal with the dependency if they
> > need the functionality that depends on that package.
> >
> > Python being a dynamic language, there does not have to be a burden on
> > users/developers if a remote part of Biopython depends on something more
> > exotic (which most users/developers will never see/install in any case).
> > Again by "exotic" I mean well known libraries with a track record of
> years
> > of stability.
>
> That all sounds reasonable. It is compile time dependencies that I am
> most wary of.
>

Pure-Python dependencies seem less scary -- a package like PLY should work
on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the
dependencies that are most tempting are the ones with essential C
extensions (numpy, scipy, matplotlib).


However, from an end user perspective having installed Biopython and
> then trying a script from a colleague and only then finding 101 optional
> run time dependencies are also needed would be annoying.
>
> For Linux packages like Debian there is a 'recommends' field for this kind
> of soft dependency. Where do we stand with declaring dependencies in
> setup.py so that if using a package manager like pip this it less painful?
>
> In fact, how many 'soft' dependencies like this do we already have?
> Just from a quick look at the README file many are not mentioned
> under the current 'System Requirements' text (e.g. Network X).
>

I just used "git grep import Bio/" to find out. The only egregious
undocumented dependencies are the ones I added in Phylo for graphics:
networkx and matplotlib/pylab.

Other *possible* dependencies are sqlite3 in the case of Jython
(Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k).

Should we add these to the "install_recommends" list in setup.py?



> > Tiago
> > PS - Another issue that it would be interesting see cleared-up would be
> the
> > policy on compile time (linkage) dependencies. Are new ones encouraged?
>
> Currently discouraged. They make installation much more painful, and
> have tended to be left untested, e.g. mmCIF was for many years disabled
> by default because no one could work out how to detect its requirements
> at compile time.
>
> > What about Java/Jython based?
>
> I'm not so keen on something providing Java/Jython only functionality.
> However, something where we could require library X under Jython
> while using library Y under C Python makes sense. Database access
> would be a perfect example - things like Python's sqlite3 don't yet exist
> under Jython.
>
> Peter
>




More information about the Biopython-dev mailing list