[BioRuby] BioRuby Digest, Vol 110, Issue 1

Raoul bonnal at ingm.org
Fri Mar 6 05:56:10 UTC 2015


Hi Chris,
we wrote the systen to take advantage of torque. In general I think that it worth installing torque/pbs or any queue systen also on a fat machine it gives you the advantage of monitoring and controlling the processes in a better way. Slurm lightweight queue system.
Recently we introduced checkpoints in the pipeline system but is a very primitive implementation. 

on the other side of the moon :-) if you want to manage jobs using dockers there are other "queue" system like mesos. this approach is suited more for companies that research centers which are used to lsf pbs slurm with clasaic grid systems. if someone is interested in mesos let me know i am working on a project to build a grid using mesos.

a "classical" hybrid approach for mixing docker and pbs is running dockers for inside a pbs script that is another possible solution.  

Best
Ra

Il 05 marzo 2015 22:05:47 CET, "Fields, Christopher J" <cjfields at illinois.edu> ha scritto:
>Francesco,
>
>Just curious, does this run as a daemon and launch jobs from a
>submission node, or use Torque job dependency system?  
>
>Would be pretty nice if it’s the latter.  Almost every pipeline tool I
>see uses the daemon approach (from a submission node) or requires jobs
>submissions from the worker nodes.  The latter two approaches don't
>work on clusters where you can’t run long tasks on the head node (no
>daemon), have no access to a submission node, or the worker nodes are
>locked down w/ no network access, all of which describe our local
>cluster setup :P  Something that’s unfortunately out of our hands.
>
>chris
>
>> On Mar 5, 2015, at 7:40 AM, Francesco Strozzi
><francesco.strozzi at gmail.com> wrote:
>> 
>> Hi Yannick,
>> yes it is possible, you can just create runnable scripts without
>sending
>> them through a queuing system. We are also putting together a more
>detailed
>> guide, with a bit more information than the README, if you are
>interested
>> let me know and we can move that online somewhere (e.g. readthedocs
>maybe).
>> 
>> For the biocore mailing list discussion, I've searched a bit and the
>thread
>> was from August 2013, Title: "NGS pipeline construction tools?"
>> 
>> Yes, I believe Docker is a great tool and the way to go now, in my
>opinion.
>> Combine that with a customisable tool that simplify the creation and
>> running of multiple jobs and you can be a step closer to a solid
>> reproducibility in data analysis (still with some caveats of course).
>> 
>> Cheers
>> Francesco
>> 
>> 
>> On Wed, 4 Mar 2015 at 11:38 Yannick Wurm <y.wurm at qmul.ac.uk> wrote:
>> 
>>> Hey Francesco,
>>> 
>>> that's very cool. I like the fact that it abstracts away all the
>>> complication of the queuing system. Can you use pipengine without a
>queuing
>>> system/scheduler? (i.e. on a single 48-core fat node)?
>>> 
>>> Is there an easily searchable bioinfo-core mailing list archive? I
>am a
>>> member but cannot easily find the discussion you mention.
>>> 
>>> I agree that its challenging to find/create one-size-fits-all
>solutions.
>>> However I do think that there is a need for a "pipelining" solution
>that is
>>> sufficiently biologist-friendly to get them to immediately see the
>value
>>> (saving them time AND improving
>agility/reproducibitliy/maintainabiltiy/sharability).
>>> Ad-hoc solutions produced by biologists tend to do everything
>badly...
>>> 
>>> Cheers,
>>> Yannick
>>> 
>>> p.s.: Sorry about the Gsoc & thanks for your efforts in putting it
>>> together...
>>> p.p.s.: docker is amazeballs :)
>>>        Have a look at (WIP) https://github.com/yeban/oswitch
>>>        We're facilitating transparent switching (files/paths/ids
>>> conserved)
>>>        back and forth between different OS.
>>> 
>>> 
>>> 
>>>> On 3 Mar 2015, at 12:00, bioruby-request at mailman.open-bio.org
>wrote:
>>>> 
>>>> Hi Yannick,
>>>> that's an interesting topic.
>>>> I have been working for a while on a Ruby package to handle
>pipelines and
>>>> distributed analyses in our Bioinformatics core: the code is here
>>>> https://github.com/fstrozzi/bioruby-pipengine .
>>>> 
>>>> With this solution we have decided to stick to a simple approach,
>i.e.
>>>> pipelines templates written in YAML where you can put raw command
>lines
>>>> with simple placeholders that get substituted at run time according
>to
>>> your
>>>> project and samples. So the DSL is reduced to a minimum and the
>tool then
>>>> creates runnable scripts that can be send through a queuing system.
>There
>>>> is also a simple error control for jobs and also checkpoints to
>skip
>>>> already completed steps for a given pipeline.
>>>> This is *very* Illumina-centric and so far it works only through a
>>>> Torque/PBS scheduler (this is what we have in-house). It is a bit
>rough
>>> but
>>>> we are using it since >2 years now and we are quite happy. I know
>it has
>>>> been used also in other places. I've recently started a Scala
>>>> implementation of this code
>(https://github.com/fstrozzi/PipEngine), to
>>>> make it more portable and also to introduce a number of
>improvements.
>>> It's
>>>> still very work in progress, but among other things we want to add
>the
>>>> support for multiple queuing systems, step dependencies and Docker
>>> support.
>>>> 
>>>> Anyway, the point with these solutions, in my opinion, is that I do
>not
>>>> think there could be a perfect tool that can fit every purpose or
>>> scenario
>>>> or environment. There was a similar discussion also on the biocore
>>> mailing
>>>> list some time ago and it turned out that many centres either use
>their
>>> own
>>>> systems or take existing solutions, such as for instance Bpipe, and
>>> modify
>>>> them to fit their needs. Nextflow is also a very nice tool.
>>>> 
>>>> In the end we have done the same and developed a solution that,
>even if
>>>> with its own limitations, fits our needs and our way of structuring
>and
>>>> organising the data analyses.
>>>> 
>>>> Cheers
>>>> Francesco
>>> 
>>> 
>>> 
>>> -------------------------------------------------------
>>> Yannick Wurm - http://wurmlab.github.io
>>> Ants, Genomes & Evolution ⋅ y.wurm at qmul.ac.uk ⋅ skype:yannickwurm ⋅
>+44
>>> 207 882 3049
>>> 5.03A Fogg ⋅ School of Biological & Chemical Sciences ⋅ Queen Mary,
>>> University of London ⋅ Mile End Road ⋅ E1 4NS London ⋅ UK
>>> 
>>> 
>> _______________________________________________
>> BioRuby Project - http://www.bioruby.org/
>> BioRuby mailing list
>> BioRuby at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/bioruby
>
>
>_______________________________________________
>BioRuby Project - http://www.bioruby.org/
>BioRuby mailing list
>BioRuby at mailman.open-bio.org
>http://mailman.open-bio.org/mailman/listinfo/bioruby

-- Inviato dal mio cellulare Android con K-9 Mail.


More information about the BioRuby mailing list