[BioLib-dev] Google Summer of Code project

Pjotr Prins pjotr.public14 at thebird.nl
Mon Mar 28 06:51:47 UTC 2011


Hi Colin,

Thank you for taking an interest in BioLib. Binding the JVM against
R, Python, Ruby, Perl etc. is important for Bioinformatics, but it
actually carries beyond that. The JVM functionality keeps growing,
and there is often good design behind that. We can only benefit.

On Sun, Mar 27, 2011 at 08:36:45PM -0600, Colin M. Diesh wrote:
> Hello,
> I'm interested in the BioLib project for using JNI to include 
> Java libraries into Biolib. 
> One part of this project is to create a proof of concept 
> mapping, for example, using the BioJAVA library. As I 
> understand it, the gsoc project would provide "... the 
> infrastructure to map existing C/C++ (_and Java_) libraries to 
> Perl/Ruby/Python, with R and JAVA planned for". 
>  
> I browsed this 
> http://thebird.nl/biolib/Adding_BioLib_BAM_SAM_Support.html to 
> see how Biolib is used to map the libraries. I have a lot of 
> experience with C, but no direct experience with JNI or SWIG, 
> so I wanted to get some feedback about this project for adding 
> Java support. I am also interested in mapping other libraries 
> (EMBOSS) if maybe there are serious obstacles to this project

That is cool. The EMBOSS bindings are also important, and if you
would choose that we can work with the EMBOSS team.

First, to assess the options with the JVM, I don't think we need to go
for fully automatic binding. Even partial bindings (methods + vars)
would be interesting. Any mapper (the person who binds) can write an
adapter, as long as it is not a very big job. 

One project has succeeded binding Python against the JVM

 http://jpype.sourceforge.net/

another binds R against the JVM

  http://www.rforge.net/rJava/

they appear to work, and are worth studying. The challenge for BioLib
is to generalize these ideas, into something that works for more
languages.  Thereby making the effort of mapping more interesting.
Also BioLib helps deployment, it provides the infrastructure to make
that possible.

One way to generalize is to, somehow, tell SWIG the binding
definitions. This could be done by generating a .h file from JNI, or
generate a SWIG definition file (XML). Once it works for one language,
it'll work for others.  To get the definitions perhaps JAVA reflection
can be used, the alternative is parsing Java, like SWIG does for .h.
JAVA parsers already exist. I don't know the best route, at this
point, but I am convinced it is doable. Note that reflection is
interesting, as it allows querying *all* JVM languages. Otherwise it
requires decompiled .class files.

But maybe it is a bit of a stretch. Accessing the JVM is important,
but dealing with it is mostly low-level hacking.

With regard to EMBOSS, the challenge is more at the bioinformatics
level.  The libraries do not provide a nice external facing API. Every
piece of functionality will have to be discussed, and create a C API.
Next it is mapped with SWIG. The EMBOSS team is interested in such an
exercise and promised to help. The API could be added to EMBOSS
itself. Certainly a worthwhile exercise.

In both cases I am happy to help you.

Pj.



More information about the BioLib-dev mailing list