[Biopython-dev] Proposal for GSoC 2017

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 2 12:35:58 UTC 2017

On Thu, Mar 2, 2017 at 11:48 AM, Sourav Singh <ssouravsingh12 at gmail.com> wrote:
> Hello Everyone,
> I am looking to propose a project for GSoC 2017 under BioPython.


> I have written my project proposal below. If anyone would be interested in
> mentoring me on the project, it would be great.
> Project Title- Add support for LLVM/ CUDA kernels to BioPython using Numba.

Sadly even if I had time, this is not an area I could mentor for GSoC.

> About Project-
> Currently Biopython has support for PyPy compiler, but the support for PyPy
> is not proper since Biopython depends on NumPy for certain functionalities,
> and NumPy has been ported to PyPy compiler.

I don't quite understand this sentence.

The PyPy team have got a lot of NumPy working nicely under PyPy, and
we do need to review how much of non-C-code NumPy using bits of
Biopython will now work here, e.g. Bio.PDB.

Code like Bio.Cluster uses NumPy at the C level, which remains a bigger
hurdle for using with PyPy.

> The aim of this project is to add support for LLVM compiler and if needed,
> support for GPUs through Numba.
> Approach-
> I am currently trying to undertake some pilot tests on kNN module of
> Biopython and benchmark the results accordingly. The project would involve
> adding support for LLVM using Numba for certain specific modules in
> Biopython which can benefit highly with the speedup. If needed, Support for
> CUDA kernels can also be added to Biopython.
> Knowledge required-
> 1) Programming skills in Python
> 2) Knowledge of BioPython internals.
> 3) Knowledge of LLVM workings
> 4) Knowledge of CUDA.
> Difficulty-
> Medium to Hard depending on the kind of module being worked on.
> Regards,
> Sourav

Other that the relatively small Bio/kNN.py code, which other bits of
Biopython are you thinking about? The kNN module is problematic
in that is doesn't really have a current maintainer, who would be a
natural candidate for mentoring work in this area.

Since it seems you are focusing on numerical analysis here, you might
find a more satisfying project with SciPy or scikit-learn - or indeed with
PyPy themselves?



More information about the Biopython-dev mailing list