[Bioperl-l] a project idea

Jason Stajich jason@chg.mc.duke.edu
Fri, 1 Jun 2001 12:40:54 -0400 (EDT)


For those of you new who wanted some ideas of how to help out on the
bioperl project, here is one I was mulling over last night.

We want bioperl to be a nice OO package for the 1.0 release which means
that every set of modules that share some functionality should be
implementing an interface.  One place this could be readily applied is the
Bio::Tools hodgepodge of modules.

Bio::Tools::Run has a couple of modules written.  They are sort of adhoc,
wrappers around specific applications - 
 o StandAloneBlast -- blastall/bl2seq
 o Alignment::Clustal -- clustalw
 o Alignment::TCoffee --  tcoffee

 (a really simple one I'd like is to be able to plug in blastcl3 -- remote
  blast which runs just like blastall into StandAloneBlast or a subclass
  of it)

or a simple way to submit jobs to a blast jobs via HTTP
 o RemoteBlast

I'd like to suggest the following.  Someone could design an interface
Bio::Tools::RunI which all these implementations can implement - something
like a bare minimum - parameters(), run(), output()  and handle checking
for executable in the path or specified dirs.

We may want to steal/learn from EnsEMBL pipeline code and/or the OMG specs
for running analysis through CORBA.  

Additionally new wrappers to run such applications as
hmmer,genscan,mzef,sim4,exonerate,fasta,genewise could be written.  Right
now we are bounded by the need to have the applications' output parsed
into bioperl objects - this would prevent exonerate or fasta runnables
from being done right away, but sim4, genscan, and mzef would be a piece
of cake I suspect.

Attachment to EMBOSS apps would also be an excellent next step.

Once we have RunI objects we can build a RunFactory.

So in summary - we'd need someone to come up with a specification for how
to design new Run modules cleanly with an interface and reimplement the
existing ones to comply to the interface.  Then new modules could be
rolled out for common apps that people use.  Phrap/RepeatMasker for those
in the sequencing end would be welcomed as well. 

Anyone interested enough to give it a go, or at least start some wiki
documentation so we can start collaborating?

-jason

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/