[BioRuby] Rakefiles for the Dindel workflow

MISHIMA, Hiroyuki missy at be.to
Wed Feb 2 03:51:56 UTC 2011


Hi all,

I wrote a rakefile for the Dindel workflow. Dindel is a toolkit to call
small indels from mapped short-read data (BAM files). See
http://www.sanger.ac.uk/resources/software/dindel/ .

The rakefile is available at https://github.com/misshie/RakefileDindel .

I think my rakefile is a good example of "dynamic task definition".
During Stage 3 of the workflow, Dindel generates over 300 files for
exome data. Although the file naming rule is known before running rake,
the number of files to be generated is unknown at the time.

van der Aalst et al. (2003) have shown this pattern as "pattern 14:
multiple instances with a priori runtime knowledge". Rakefiles can
describe this pattern using Rake::Task#invoke.

Furthermore, this rakefile demonstrates effectiveness of Pwrake. Stage 3
is a typical embarrassingly parallel problem.

To improve rakefile readability, I separated a Rakefile into Rakefile,
Rakefile.invoke, and Rakefile.helper. Rakefile is workflow description.
Rakefile.invoke is command-lines to invoke tools. Rakefile.helper has
helper methods making Rakefile simpler.

Previously Yannick Wurm has shown a rakefile "cdsToAlignmentToTree" at
https://github.com/yannickwurm/tidbits/ . The rakefile handles
exceptions carefully. Mine does not. Yannick's approach is important
because sometimes error messages in rakefiles are not intuitive.

So far, my workflow does not use BioRuby at all. Raoul Bonnal have
suggested BioRuby-rake integration replying Yannick's post. Introducing
modular task definition to rake is what we need but not very easy
because each workflow step in a rakefiles is linked too tight.

Introducing a BioRuby plug-in to support common helper methods
simplifying rakefiles seems easier. It may contain my Rakefile.helper,
and Yannick's helper methods and exception handling.

I will try to write a small plug-in for a while.

Sincerely yours,
Hiro.
-- 
MISHIMA, Hiroyuki, DDS, Ph.D.
COE Research Fellow
Department of Human Genetics
Nagasaki University Graduate School of Biomedical Sciences



More information about the BioRuby mailing list