[BioRuby] BioRuby Digest, Vol 110, Issue 1

Francesco Strozzi francesco.strozzi at gmail.com
Fri Mar 6 15:12:28 UTC 2015


Hi Chris,
no it doesn't run as a deamon, if you want to submit jobs through Torque,
Pipengine creates the runnable scripts and then connects to a submission
node through SSH, where it will issue the submission commands
automatically. So the impact on the infrastructure is minimal, you can
install Pipengine on a server where you have less restrictions and it
should work. You need of course to have common shared folders among your
working server, the submission node and the cluster nodes, but this should
be a fairly common configuration, I guess.

Cheers
Francesco

--
Francesco Strozzi



On Thu, 5 Mar 2015 at 22:07 Fields, Christopher J <cjfields at illinois.edu>
wrote:

> Francesco,
>
> Just curious, does this run as a daemon and launch jobs from a submission
> node, or use Torque job dependency system?
>
> Would be pretty nice if it’s the latter.  Almost every pipeline tool I see
> uses the daemon approach (from a submission node) or requires jobs
> submissions from the worker nodes.  The latter two approaches don't work on
> clusters where you can’t run long tasks on the head node (no daemon), have
> no access to a submission node, or the worker nodes are locked down w/ no
> network access, all of which describe our local cluster setup :P  Something
> that’s unfortunately out of our hands.
>
> chris
>
> > On Mar 5, 2015, at 7:40 AM, Francesco Strozzi <
> francesco.strozzi at gmail.com> wrote:
> >
> > Hi Yannick,
> > yes it is possible, you can just create runnable scripts without sending
> > them through a queuing system. We are also putting together a more
> detailed
> > guide, with a bit more information than the README, if you are interested
> > let me know and we can move that online somewhere (e.g. readthedocs
> maybe).
> >
> > For the biocore mailing list discussion, I've searched a bit and the
> thread
> > was from August 2013, Title: "NGS pipeline construction tools?"
> >
> > Yes, I believe Docker is a great tool and the way to go now, in my
> opinion.
> > Combine that with a customisable tool that simplify the creation and
> > running of multiple jobs and you can be a step closer to a solid
> > reproducibility in data analysis (still with some caveats of course).
> >
> > Cheers
> > Francesco
> >
> >
> > On Wed, 4 Mar 2015 at 11:38 Yannick Wurm <y.wurm at qmul.ac.uk> wrote:
> >
> >> Hey Francesco,
> >>
> >> that's very cool. I like the fact that it abstracts away all the
> >> complication of the queuing system. Can you use pipengine without a
> queuing
> >> system/scheduler? (i.e. on a single 48-core fat node)?
> >>
> >> Is there an easily searchable bioinfo-core mailing list archive? I am a
> >> member but cannot easily find the discussion you mention.
> >>
> >> I agree that its challenging to find/create one-size-fits-all solutions.
> >> However I do think that there is a need for a "pipelining" solution
> that is
> >> sufficiently biologist-friendly to get them to immediately see the value
> >> (saving them time AND improving agility/reproducibitliy/
> maintainabiltiy/sharability).
> >> Ad-hoc solutions produced by biologists tend to do everything badly...
> >>
> >> Cheers,
> >> Yannick
> >>
> >> p.s.: Sorry about the Gsoc & thanks for your efforts in putting it
> >> together...
> >> p.p.s.: docker is amazeballs :)
> >>        Have a look at (WIP) https://github.com/yeban/oswitch
> >>        We're facilitating transparent switching (files/paths/ids
> >> conserved)
> >>        back and forth between different OS.
> >>
> >>
> >>
> >>> On 3 Mar 2015, at 12:00, bioruby-request at mailman.open-bio.org wrote:
> >>>
> >>> Hi Yannick,
> >>> that's an interesting topic.
> >>> I have been working for a while on a Ruby package to handle pipelines
> and
> >>> distributed analyses in our Bioinformatics core: the code is here
> >>> https://github.com/fstrozzi/bioruby-pipengine .
> >>>
> >>> With this solution we have decided to stick to a simple approach, i.e.
> >>> pipelines templates written in YAML where you can put raw command lines
> >>> with simple placeholders that get substituted at run time according to
> >> your
> >>> project and samples. So the DSL is reduced to a minimum and the tool
> then
> >>> creates runnable scripts that can be send through a queuing system.
> There
> >>> is also a simple error control for jobs and also checkpoints to skip
> >>> already completed steps for a given pipeline.
> >>> This is *very* Illumina-centric and so far it works only through a
> >>> Torque/PBS scheduler (this is what we have in-house). It is a bit rough
> >> but
> >>> we are using it since >2 years now and we are quite happy. I know it
> has
> >>> been used also in other places. I've recently started a Scala
> >>> implementation of this code (https://github.com/fstrozzi/PipEngine),
> to
> >>> make it more portable and also to introduce a number of improvements.
> >> It's
> >>> still very work in progress, but among other things we want to add the
> >>> support for multiple queuing systems, step dependencies and Docker
> >> support.
> >>>
> >>> Anyway, the point with these solutions, in my opinion, is that I do not
> >>> think there could be a perfect tool that can fit every purpose or
> >> scenario
> >>> or environment. There was a similar discussion also on the biocore
> >> mailing
> >>> list some time ago and it turned out that many centres either use their
> >> own
> >>> systems or take existing solutions, such as for instance Bpipe, and
> >> modify
> >>> them to fit their needs. Nextflow is also a very nice tool.
> >>>
> >>> In the end we have done the same and developed a solution that, even if
> >>> with its own limitations, fits our needs and our way of structuring and
> >>> organising the data analyses.
> >>>
> >>> Cheers
> >>> Francesco
> >>
> >>
> >>
> >> -------------------------------------------------------
> >> Yannick Wurm - http://wurmlab.github.io
> >> Ants, Genomes & Evolution ⋅ y.wurm at qmul.ac.uk ⋅ skype:yannickwurm ⋅ +44
> >> 207 882 3049
> >> 5.03A Fogg ⋅ School of Biological & Chemical Sciences ⋅ Queen Mary,
> >> University of London ⋅ Mile End Road ⋅ E1 4NS London ⋅ UK
> >>
> >>
> > _______________________________________________
> > BioRuby Project - http://www.bioruby.org/
> > BioRuby mailing list
> > BioRuby at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/bioruby
>
>


More information about the BioRuby mailing list