[Biojava-l] Job / Task Scheduler for Biojava (Webservice)

Ralf Sigmund ralf.sigmund at ipk-gatersleben.de
Tue Nov 11 14:37:22 EST 2003

I have been investigating on Solutions to accurately describe and execute
Bioinformatics analysis tasks.

I am interested in an Analysis platform which offers the following
possibilities for
	protocol based Planning,  Execution and Result-Reporting of
multistep processes
		find syntenious regions by comparison of ests & marker
positions from a species lacking a genomic map 
		with the completed genome map of another species.
		this task will be achieved by a sequence of subtasks
		each subtask will possess several degrees of freedom: 
			- filtering / choice of input data 
			- the sanitization of data
			- the choice of the algorithm
			- setting of multiple parameters, thresholds
		a framework could support tasks like this in several ways:
		--allow unambiguous definition of the steps in a storable
		  which could be exchanged with other scientists and allow
them to reproduce the experiment

		--protocol of valuable execution parameters like the
actually used dataset versions, start-, end-time points

		--allow for annotation / documentation of intermediate steps
and the presentation of these results in a repository in order to facilitate
their reuse in additional in silico experiments (possibly done by different

		--allow for concurrent execution of several experimentators
optimizing the utilization of computing resources.
		--allow for scheduled reiteration of experiments after
source-database updates
Starting with L.Stein's commentary in Nature "Creating a Bioinformatics
Nation" and by reading the available material on the OmniGene Project one
might have guessed that Java would be an ideal Platform for a new generation
of data and task integrating Middleware Software.

However the OmniGene effort has been transferred into the non/public
corporate space and even before there was no widespread adoption of this
platform (judged by the sourceforge traffic, the lack of citations..)

Recently I discovered the BioPipe project and its accompanying publication
in Genome Research.
The project is mature, tightly integrated with Bioperl and allmost completly
fullfills the above stated requirements.
However BioPipe is based on Perl and now I wonder if Java would not be more
advantageous as a platform of this kind.

I will try to list the advantages of JAVA and Perl in this application below
and hope for your comments:

(1)Compared to Perl Java has advanced Object Orientation support which
allows for more transparent and modular architectures. Development tools
like Eclipse/Omodo-UML even increase this advantage. 
(2)Component Transaction Monitors like the Application Server JBOSS
(j2ee,ejb) are an ideal platform for the Management of multiple user /
multiple task scenarios. The j2ee-technology is successfully used in many
similar applications in other industries. Advanced client applications could
really benefit form Object Remoting provided by the J2ee Platform.
(3)Based on my limited knowledge the Java Platform appears to have a much
tighter (more failsafe?) incorporation of XML (XML-Schema - class binding
with JAXB) and Webservice Technologies (SOAP) (Apache Tomcat/AXIS).
(4) There are several workflow design and management tools even with graphic
editors. Integration of this j2ee based projects might allow big advantages
to this part. 

I see 2 major disadvantages for Java:
(1) bioinformatics tools are typically command line tools. The Perl on Unix
platform is the best way to invoke such tools from a program. Java's
platform independence appears to be the source for its weakness in this
(2) the bioperl project has a far bigger codebase, and more contributors
than any JAVA Bioinformatics efforts like Biojava and Omnigene.

I wonder if Java will ever become a significant technology for public / open
source bioinformatics projects?
It seems like the existing headstart perl based projects now have outweighs
any advantages the Java Technology offers.
Thanks for your comments on this ideas...


Dr. Ralf Sigmund
Institut für Pflanzengenetik
und Kulturpflanzenforschung (IPK)
Corrensstraße 3
D-06466 Gatersleben
Tel:   +49/(0)39482/5-659
Fax:   +49/(0)39482/5-595
mailto:ralf.sigmund at ipk-gatersleben.de

More information about the Biojava-l mailing list