[Biojava-l] GSoC Application Discussion and Help - Porting BLAST to Java

Dhruv Sharma sharma.dhrv at gmail.com
Sat Mar 31 20:46:06 UTC 2012


Hi,

I am Dhruv Sharma, a senior undergraduate student pursuing B.E.(Hons.)
Computer Science at BITS, Pilani, India.

I am very much interested in 'porting BLAST algorithm to Java' as a GSoC
2012 project. I am proficient and primarily work using Java and C. Also, I
have past experience of working in C++ before migrating to Java. However, I
am new to GSoC and haven't used version control in the past.

My recent project was based on developing a web application in Java for
posting data to remote CS-BLAST web
service<http://toolkit.tuebingen.mpg.de/cs_blast/> with
FASTA sequence, parse and auto-filter its results using the release date
from RCSB PDB <http://www.rcsb.org/pdb/home/home.do> and download the PDB
files.

Since, the project aims at converting the legacy C/C++ code to Java,
already suggested approaches on the Bio-Java page and my observations are:-

1)  Using C++ to Java converters for 100% conversion. I have tried
converting the ncbi-blast-2.2.26 source code using a few freely available
converters but all of them either crashed or failed to convert even after I
resolved certain header file dependency issues that emerged. Most failures
occurred at function calls to non-standard C++ libraries.

2)  Using JNI as an alternative solution. JNI programming would be a
tedious task and would anyway require understanding of the purpose of
underlying C++ code. Hence,has little advantage over rewriting the
equivalent Java code. A significant advantage can be seen when there is no
efficient Java alternative of the C++ code. However, platform dependence
would still exist.

According to my understanding of the problem, a hybrid approach can be
taken up which includes using code converters for simpler files, manual
coding for tricky areas and using JNI for typical C++ code involving
non-standard libraries. But, I am still not clear about my exact course of
action.

Can you please tell me if my analysis of the problem is correct? Please
also comment on the feasibility of my suggested approach and please make
any suggestions as they would help me in improving my application draft
that I would soon be sharing for review.

As BLAST is a collection of programs, so, keeping in mind the length of
code to be ported, can we work on certain selectively critical programs in
it from the GSoC's perspective?


Thanks.

-- 
*Dhruv Sharma*
*Student
B.E.(Hons.) Computer Science
BITS, Pilani
*
*India*



More information about the Biojava-l mailing list