[BioRuby] Workflows and Parallelization

Tue Dec 21 17:50:24 UTC 2010

My original request was on analysis approach on NGS and transcriptome.

btw

Very interesting Hiro,

searching on github there is also this project:
https://github.com/grosser/parallel

Run any code in parallel Processes(> use all CPUs) or Threads(> speedup blocking operations).
Best suited for map-reduce or e.g. parallel downloads/uploads.
Processes

Speedup through multiple CPUs
Speedup for blocking operations
Protects global data
Extra memory used ( very low on REE through copy_on_write_friendly )
Child processes are killed when your main process is killed through Ctrl+c or kill -2
Threads

Speedup for blocking operations
Global data can be modified
No extra memory used
Processes/Threads are workers, they grab the next piece of work when they finish

On 21/dic/2010, at 15.29, MISHIMA, Hiroyuki wrote:

> Hi all,
> 
> I would like to say something about workflow automation in Ruby.
> 
> Recently I am interested in using Parallel Workflow extention for Rake
> (Pwrake) for NextGen sequencer data processing. As you may know, Rake is Ruby Make, a build tool.
> 
> Pwrake is developped by Masahiro Tanaka at University of Tsukuba. He is
> also the author of "NArray", very fast matrix calculation engine for
> Ruby. Although Pwrake and regular Rake are compatible in syntax, Pwrake
> automatically detects workflow steps that can be run in parallel.
> 
> Pwrake's parallelization model is "process based". Because I am just a
> *user* of bioinformatics packages (like BWA/GATK/DINDEL etc..), it is
> what I need.
> 
> Pwrake invokes processes via ssh and supports the Gfarm large-scale
> distributed filesystem. Of course, it works well on a multi-processor
> Linux box.
> 
> Although Pwrake is developed for astronomy science, its goal is also
> common in bioinformatics.
> 
> I think that some helper methods may simplify Rakefiles for bioinformatics, and such helper methods are good for a BioRuby plugin.
> 
> FYI,
> Pwrake on github:
> https://github.com/masa16/Pwrake/
> 
> Presentation at RubyConfX:
> http://www.slideshare.net/masa16tanaka/ruby-conftanaka16
> 
> Presentation at PRAGMA18:
> http://goc.pragma-grid.net/pragma-doc/pragma18/Cool_Things/pwrake.pptx
> 
> Thanks,
> Hiro.
> 
> Raoul Bonnal wrote(2010/12/21 21:33):
>> Hi all, I read in the irc's log that a lot of the memebers are or
>> will start working on ngs data. I'll re-start ( I worked on 454) with
>> the Illumina platform, in few week, do you have some consolidate
>> workflow to follow for transcriptome analysis ?
>> 
>> papers, blogs, etc.. everything is ok :-)
>> 
>> I'm also interested about the workflow discussion but... are the
>> workflow intended to let the not bioinformaticians analyze complex
>> datasets or automate some tedious and repetitive task that we
>> (bioinformaticians) must do every time ? In both cases our life will
>> be probably better :-)
>> 
>> -- Ra
> -- 
> MISHIMA, Hiroyuki, DDS, Ph.D.
> COE Research Fellow
> Department of Human Genetics
> Nagasaki University Graduate School of Biomedical Sciences
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby