[Biopython-dev] Proposition for Biopython 2

Patrick Kunzmann padix.kleber at gmail.com
Wed Nov 22 13:55:50 UTC 2017


Hello Shyam,

thank you for your interest.

1. I made a benchmark for structure superimposition and global pairwise 
sequence alignments. In Biopython both functions are already accelerated 
using C-extensions. In Biotite the superimposition is accelerated via 
NumPy and the alignment via a Cython module. In both cases the timing 
was started after the creation of the respective internal representation 
(Bio.PDB.Structure, Bio.Seq.Seq and biotite.structure.AtomArray, 
biotite.sequence.ProteinSequence). For the superimposition, the 
structure of lysozyme (1aki) was superimposed on itself (1000 repeats). 
For the alignment, two 1000-bp polyalanine sequences were used (100 
repeats). The benchmark script is attached to this mail. This was the 
output:

1000x Biopython coordinate superimposition: 5.644912210998882 s
1000x Biotite coordinate superimposition: 0.37393549399894255 s
100x Biopython pairwise global alignment: 6.318634318999102 s
100x Biotite pairwise global alignment: 0.5369201499997871 s

In both cases Biotite was one order of magnitude faster than the 
respective implementation in Biopython.

2. Yes, porting already written Biopython code would be extremely 
difficult. This was one of my reasons for eventually putting the code in 
a separate project.

Best regards,

Patrick



On 21.11.2017 21:36, Shyam Saladi wrote:
> Hi Patrick,
>
> This looks pretty cool. I wonder two things:
>
> 1. Do you happen to have timing benchmarks comparing similar functions 
> in Biotite and Biopython? I'm curious about what functionality is 
> actually faster/sped up by numpy/cython code.
>
> 2. From a cursory look through the source and from reading your docs, 
> biotite seems like a significantly different module compared to the 
> current version of BioPython. If this sort of organization was used 
> for a Biopython 2, it seems like it would be very difficult to port 
> already-written, currently developed code to the new system. That 
> said, for those parts that are faster, as a user, I think it might be 
> useful to merge them into Biopython if it's possible to do so in 
> an API-compatible way. I wonder what others think about this...
>
> Thanks,
> Shyam
>
> On Wed, Nov 15, 2017 at 12:40 AM, Patrick Kunzmann 
> <padix.kleber at gmail.com <mailto:padix.kleber at gmail.com>> wrote:
>
>     Dear Biopython community,
>
>     I decided to put the proposed Biopython 2 code base into an
>     separate project for the time being. The main reason for this are
>     the clarity issues that have bothered me lately: Although
>     distinguishing Biopython and the proposed Biopython 2 would be
>     easy on GitHub (different repos) and relatively easy on PyPI
>     (version identifier), I think confusions could occur in other
>     contexts (e.g. in the mailing list or StackOverflow). Therefore,
>     the project is continued unter the name 'Biotite'. The repository
>     was moved into a separate GitHub organisation and can be found at
>     https://github.com/biotite-dev/biotite
>     <https://github.com/biotite-dev/biotite> . The project is still
>     licensed under BSD 3-clause so potentially a project merge is
>     still possible at a later point.
>
>     Best regards,
>     Patrick
>
>
>     On 02.11.2017 11:42, Patrick Kunzmann wrote:
>
>         Dear Biopython community,
>
>         here I present you a proposition for a potential Biopython 2.x
>         code base. But first things first:
>
>         A few months ago I proposed an endeavor to rewrite Biopython
>         in order to bring it onto modern scientific Python standards
>         (http://lists.open-bio.org/pipermail/biopython-dev/2017-June/021740.html
>         <http://lists.open-bio.org/pipermail/biopython-dev/2017-June/021740.html>).
>         Arguably, the consensus was that this is something that should
>         be done, but those changes would require almost a complete
>         rewrite and barely anyone has time for it. Therefore, I took
>         the initiative some time later and created an experimental
>         repository for creating actual Biopython 2 code
>         (https://github.com/padix-key/biopython2experimental
>         <https://github.com/padix-key/biopython2experimental>).
>         Unfortunately it seems that the announcement mail for that did
>         not reach the mailing list, but went missing in the deep of
>         the web. Anyway, the repository is now at a presentable state.
>         The corresponding HTML documentation (including tutorial, API
>         reference and install instructions) can be found under
>         https://github.com/padix-key/biopython2/files/1437242/doc.zip
>         <https://github.com/padix-key/biopython2/files/1437242/doc.zip>
>         . So far it is not possible to install the package from PyPI,
>         since it is not the offical Biopython 2 package. Instead you
>         have to install it directly from the repo, if you want to test
>         the package.
>
>         The package contains basic types and operations for working
>         with structure and sequence data, offers biological database
>         interaction with RCSB and NCBI Entrez and provides seamless
>         interfaces to external software. Although the package aims to
>         achieve similar area of application as Biopython 1.x, it is a
>         complete rewrite.
>
>         The package is still in early development. I tried to
>         incorporate the ideas you and I brought up in the Biopython 2
>         discussion and still everything is subject to changes in the
>         discussion with you. I already have some questions for discussion:
>
>         1. Should this package still be dual licensed? Since the BSD
>         3-Clause and the Biopython license are quite similar, I would
>         suggest licensing Biopython 2 only under BSD 3-Clause for
>         clarity. But I do not have a strong opinion on that.
>
>         2. In our previous discussion some of you proposed putting
>         only core functionality into Biopython 2 and leaving
>         specialized code installable as plugins. This package does not
>         contain a mechanism for plugin packages, yet. I would rather
>         suggest a 'recommended packages' approach: Code that is based
>         on Biopython 2 and tackles a general biological problem would
>         be linked in a 'Recommended packages' section of the Biopython
>         2 documentation. In my opinion, direct plugins in the
>         Biopython 2 package requires some confusing namespace
>         wizardry. Recommended packages would achieve almost the same,
>         with the slight difference, that the user writes 'import
>         recommendedpackage' rather than 'import biopython.someplugin'.
>
>         If this package is accepted by the community, I would like to
>         hand over repository ownership to the 'Biopython' organisation
>         on GitHub and I would like to continue and supervise its
>         development as part of the GitHub 'Biopython' organisation.
>
>         Best regards,
>         Patrick Kunzmann
>
>
>     _______________________________________________
>     Biopython-dev mailing list
>     Biopython-dev at mailman.open-bio.org
>     <mailto:Biopython-dev at mailman.open-bio.org>
>     http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>     <http://mailman.open-bio.org/mailman/listinfo/biopython-dev>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20171122/54bc9aff/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchmark.py
Type: text/x-python
Size: 1579 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20171122/54bc9aff/attachment-0001.py>


More information about the Biopython-dev mailing list