[BioRuby] BioRuby Digest, Vol 110, Issue 1

Wed Mar 4 10:38:30 UTC 2015

Hey Francesco,

that's very cool. I like the fact that it abstracts away all the complication of the queuing system. Can you use pipengine without a queuing system/scheduler? (i.e. on a single 48-core fat node)?

Is there an easily searchable bioinfo-core mailing list archive? I am a member but cannot easily find the discussion you mention.

I agree that its challenging to find/create one-size-fits-all solutions. However I do think that there is a need for a "pipelining" solution that is sufficiently biologist-friendly to get them to immediately see the value (saving them time AND improving agility/reproducibitliy/maintainabiltiy/sharability). Ad-hoc solutions produced by biologists tend to do everything badly... 

Cheers,
Yannick

p.s.: Sorry about the Gsoc & thanks for your efforts in putting it together... 
p.p.s.: docker is amazeballs :) 
        Have a look at (WIP) https://github.com/yeban/oswitch  
        We're facilitating transparent switching (files/paths/ids conserved) 
        back and forth between different OS. 

> On 3 Mar 2015, at 12:00, bioruby-request at mailman.open-bio.org wrote:
> 
> Hi Yannick,
> that's an interesting topic.
> I have been working for a while on a Ruby package to handle pipelines and
> distributed analyses in our Bioinformatics core: the code is here
> https://github.com/fstrozzi/bioruby-pipengine .
> 
> With this solution we have decided to stick to a simple approach, i.e.
> pipelines templates written in YAML where you can put raw command lines
> with simple placeholders that get substituted at run time according to your
> project and samples. So the DSL is reduced to a minimum and the tool then
> creates runnable scripts that can be send through a queuing system. There
> is also a simple error control for jobs and also checkpoints to skip
> already completed steps for a given pipeline.
> This is *very* Illumina-centric and so far it works only through a
> Torque/PBS scheduler (this is what we have in-house). It is a bit rough but
> we are using it since >2 years now and we are quite happy. I know it has
> been used also in other places. I've recently started a Scala
> implementation of this code (https://github.com/fstrozzi/PipEngine), to
> make it more portable and also to introduce a number of improvements. It's
> still very work in progress, but among other things we want to add the
> support for multiple queuing systems, step dependencies and Docker support.
> 
> Anyway, the point with these solutions, in my opinion, is that I do not
> think there could be a perfect tool that can fit every purpose or scenario
> or environment. There was a similar discussion also on the biocore mailing
> list some time ago and it turned out that many centres either use their own
> systems or take existing solutions, such as for instance Bpipe, and modify
> them to fit their needs. Nextflow is also a very nice tool.
> 
> In the end we have done the same and developed a solution that, even if
> with its own limitations, fits our needs and our way of structuring and
> organising the data analyses.
> 
> Cheers
> Francesco

-------------------------------------------------------
Yannick Wurm - http://wurmlab.github.io
Ants, Genomes & Evolution ⋅ y.wurm at qmul.ac.uk ⋅ skype:yannickwurm ⋅ +44 207 882 3049
5.03A Fogg ⋅ School of Biological & Chemical Sciences ⋅ Queen Mary, University of London ⋅ Mile End Road ⋅ E1 4NS London ⋅ UK