[BioRuby] Workflows and Parallelization

Jan Aerts jan.aerts at gmail.com
Mon Feb 7 08:42:24 UTC 2011


I've been using rake for my NGS analyses while I still worked at Sanger.
Will post more about that in the next couple of days.

jan.

On 21 December 2010 18:50, Raoul Bonnal <bonnalraoul at ingm.it> wrote:

> My original request was on analysis approach on NGS and transcriptome.
>
> btw
>
>
> Very interesting Hiro,
>
> searching on github there is also this project:
> https://github.com/grosser/parallel
>
> Run any code in parallel Processes(> use all CPUs) or Threads(> speedup
> blocking operations).
> Best suited for map-reduce or e.g. parallel downloads/uploads.
> Processes
>
> Speedup through multiple CPUs
> Speedup for blocking operations
> Protects global data
> Extra memory used ( very low on REE through copy_on_write_friendly )
> Child processes are killed when your main process is killed through Ctrl+c
> or kill -2
> Threads
>
> Speedup for blocking operations
> Global data can be modified
> No extra memory used
> Processes/Threads are workers, they grab the next piece of work when they
> finish
>
>
> On 21/dic/2010, at 15.29, MISHIMA, Hiroyuki wrote:
>
> > Hi all,
> >
> > I would like to say something about workflow automation in Ruby.
> >
> > Recently I am interested in using Parallel Workflow extention for Rake
> > (Pwrake) for NextGen sequencer data processing. As you may know, Rake is
> Ruby Make, a build tool.
> >
> > Pwrake is developped by Masahiro Tanaka at University of Tsukuba. He is
> > also the author of "NArray", very fast matrix calculation engine for
> > Ruby. Although Pwrake and regular Rake are compatible in syntax, Pwrake
> > automatically detects workflow steps that can be run in parallel.
> >
> > Pwrake's parallelization model is "process based". Because I am just a
> > *user* of bioinformatics packages (like BWA/GATK/DINDEL etc..), it is
> > what I need.
> >
> > Pwrake invokes processes via ssh and supports the Gfarm large-scale
> > distributed filesystem. Of course, it works well on a multi-processor
> > Linux box.
> >
> > Although Pwrake is developed for astronomy science, its goal is also
> > common in bioinformatics.
> >
> > I think that some helper methods may simplify Rakefiles for
> bioinformatics, and such helper methods are good for a BioRuby plugin.
> >
> > FYI,
> > Pwrake on github:
> > https://github.com/masa16/Pwrake/
> >
> > Presentation at RubyConfX:
> > http://www.slideshare.net/masa16tanaka/ruby-conftanaka16
> >
> > Presentation at PRAGMA18:
> > http://goc.pragma-grid.net/pragma-doc/pragma18/Cool_Things/pwrake.pptx
> >
> > Thanks,
> > Hiro.
> >
> > Raoul Bonnal wrote(2010/12/21 21:33):
> >> Hi all, I read in the irc's log that a lot of the memebers are or
> >> will start working on ngs data. I'll re-start ( I worked on 454) with
> >> the Illumina platform, in few week, do you have some consolidate
> >> workflow to follow for transcriptome analysis ?
> >>
> >> papers, blogs, etc.. everything is ok :-)
> >>
> >> I'm also interested about the workflow discussion but... are the
> >> workflow intended to let the not bioinformaticians analyze complex
> >> datasets or automate some tedious and repetitive task that we
> >> (bioinformaticians) must do every time ? In both cases our life will
> >> be probably better :-)
> >>
> >> -- Ra
> > --
> > MISHIMA, Hiroyuki, DDS, Ph.D.
> > COE Research Fellow
> > Department of Human Genetics
> > Nagasaki University Graduate School of Biomedical Sciences
> > _______________________________________________
> > BioRuby Project - http://www.bioruby.org/
> > BioRuby mailing list
> > BioRuby at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioruby
>
>
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>



More information about the BioRuby mailing list