[BioRuby] BioRuby Digest, Vol 110, Issue 1

Thu Mar 5 21:05:47 UTC 2015

Francesco,

Just curious, does this run as a daemon and launch jobs from a submission node, or use Torque job dependency system?  

Would be pretty nice if it’s the latter.  Almost every pipeline tool I see uses the daemon approach (from a submission node) or requires jobs submissions from the worker nodes.  The latter two approaches don't work on clusters where you can’t run long tasks on the head node (no daemon), have no access to a submission node, or the worker nodes are locked down w/ no network access, all of which describe our local cluster setup :P  Something that’s unfortunately out of our hands.

chris

> On Mar 5, 2015, at 7:40 AM, Francesco Strozzi <francesco.strozzi at gmail.com> wrote:
> 
> Hi Yannick,
> yes it is possible, you can just create runnable scripts without sending
> them through a queuing system. We are also putting together a more detailed
> guide, with a bit more information than the README, if you are interested
> let me know and we can move that online somewhere (e.g. readthedocs maybe).
> 
> For the biocore mailing list discussion, I've searched a bit and the thread
> was from August 2013, Title: "NGS pipeline construction tools?"
> 
> Yes, I believe Docker is a great tool and the way to go now, in my opinion.
> Combine that with a customisable tool that simplify the creation and
> running of multiple jobs and you can be a step closer to a solid
> reproducibility in data analysis (still with some caveats of course).
> 
> Cheers
> Francesco
> 
> 
> On Wed, 4 Mar 2015 at 11:38 Yannick Wurm <y.wurm at qmul.ac.uk> wrote:
> 
>> Hey Francesco,
>> 
>> that's very cool. I like the fact that it abstracts away all the
>> complication of the queuing system. Can you use pipengine without a queuing
>> system/scheduler? (i.e. on a single 48-core fat node)?
>> 
>> Is there an easily searchable bioinfo-core mailing list archive? I am a
>> member but cannot easily find the discussion you mention.
>> 
>> I agree that its challenging to find/create one-size-fits-all solutions.
>> However I do think that there is a need for a "pipelining" solution that is
>> sufficiently biologist-friendly to get them to immediately see the value
>> (saving them time AND improving agility/reproducibitliy/maintainabiltiy/sharability).
>> Ad-hoc solutions produced by biologists tend to do everything badly...
>> 
>> Cheers,
>> Yannick
>> 
>> p.s.: Sorry about the Gsoc & thanks for your efforts in putting it
>> together...
>> p.p.s.: docker is amazeballs :)
>>        Have a look at (WIP) https://github.com/yeban/oswitch
>>        We're facilitating transparent switching (files/paths/ids
>> conserved)
>>        back and forth between different OS.
>> 
>> 
>> 
>>> On 3 Mar 2015, at 12:00, bioruby-request at mailman.open-bio.org wrote:
>>> 
>>> Hi Yannick,
>>> that's an interesting topic.
>>> I have been working for a while on a Ruby package to handle pipelines and
>>> distributed analyses in our Bioinformatics core: the code is here
>>> https://github.com/fstrozzi/bioruby-pipengine .
>>> 
>>> With this solution we have decided to stick to a simple approach, i.e.
>>> pipelines templates written in YAML where you can put raw command lines
>>> with simple placeholders that get substituted at run time according to
>> your
>>> project and samples. So the DSL is reduced to a minimum and the tool then
>>> creates runnable scripts that can be send through a queuing system. There
>>> is also a simple error control for jobs and also checkpoints to skip
>>> already completed steps for a given pipeline.
>>> This is *very* Illumina-centric and so far it works only through a
>>> Torque/PBS scheduler (this is what we have in-house). It is a bit rough
>> but
>>> we are using it since >2 years now and we are quite happy. I know it has
>>> been used also in other places. I've recently started a Scala
>>> implementation of this code (https://github.com/fstrozzi/PipEngine), to
>>> make it more portable and also to introduce a number of improvements.
>> It's
>>> still very work in progress, but among other things we want to add the
>>> support for multiple queuing systems, step dependencies and Docker
>> support.
>>> 
>>> Anyway, the point with these solutions, in my opinion, is that I do not
>>> think there could be a perfect tool that can fit every purpose or
>> scenario
>>> or environment. There was a similar discussion also on the biocore
>> mailing
>>> list some time ago and it turned out that many centres either use their
>> own
>>> systems or take existing solutions, such as for instance Bpipe, and
>> modify
>>> them to fit their needs. Nextflow is also a very nice tool.
>>> 
>>> In the end we have done the same and developed a solution that, even if
>>> with its own limitations, fits our needs and our way of structuring and
>>> organising the data analyses.
>>> 
>>> Cheers
>>> Francesco
>> 
>> 
>> 
>> -------------------------------------------------------
>> Yannick Wurm - http://wurmlab.github.io
>> Ants, Genomes & Evolution ⋅ y.wurm at qmul.ac.uk ⋅ skype:yannickwurm ⋅ +44
>> 207 882 3049
>> 5.03A Fogg ⋅ School of Biological & Chemical Sciences ⋅ Queen Mary,
>> University of London ⋅ Mile End Road ⋅ E1 4NS London ⋅ UK
>> 
>> 
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioruby