[Bioperl-l] RE: bioperl pipeline picking momentum

Fri, 12 Apr 2002 17:45:30 +0800

Elia,

Very comprehensive plan. We will work together.

Larry

-----Original Message-----
From: Elia Stupka [mailto:elia@fugu-sg.org]
Sent: Friday, April 12, 2002 5:36 PM
To: Bioperl
Cc: Ensembl dev list; Prasanna R Kolatkar; Lai Loong Fong; Larry Ang
Subject: bioperl pipeline picking momentum

Dear all,

just thought I would let you know that the effort to create a
bioperl-pipeline based on extending and improving the already very capable
ensembl-pipeline is well underway.

Jer Ming Chia and Shawn Hoon have spent two useful weeks at Hinxton
discussing with ensemblers the specs of the new pipeline, and have started
coding it all up.

Moreover over here in Singapore a few Institutes have taken interest in
it, so we are likely to see a strong interest as well as broader set of
coders working on the project.

Some of the aims of the new pipeline as compared to the previous one are
(in order of ease of achievement):

1)Making the system very flexible in terms of where the input data comes
from and where the output results should be stored. This used to be all in
one mysql db, now it should be able to come from anywhere provided
adaptors are in place to communicate to the resource. 
[already underway]

2)Making the system less LSF dependent. As a first step we are starting to
play with PBS both on an alpha cluster and Itanium cluster, and will code
the modules needed to make it interact with PBS. PBS is free and thus if
we can make it work stably it opens the pipeline for use to a much wider
set of people, even for small multiprocessor systems. 
[will start next week]

3)Making the pipeline GRID aware. This means making the pipeline code talk
to GLOBUS and being able to use within a local pipeline resources
(data/cpu) available elsewhere seemlessly, or almost ;) 
[will start on this in a few weeks]

4)Taking advantage of the GRID-awareness to start reasoning in terms of
allocating analysis runs according to where they are most suited or where
there is more resources. In other words run cpu-intensive jobs on SNP
systems, small-but-many jobs on MPP systems, and of course allocate "a la
LSF" according to resources available.
[wishful thinking?]

Just thought I should let you know :)

Elia

-- 
********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 874 1467         *
* mobile: +65 90307613         *
* fax:    +65 777 0402         *
********************************