[BioRuby] Improve rake/snakemake/nextflow.io?

Mon Mar 2 15:43:49 UTC 2015

Hi all,

dumb question that hasn't been asked/discussed here for a while...
What's the easiest way to make a *simple* pipeline?

Two contenders that come up in google: 

* snakemake 
  http://metagenomic-methods-for-microbial-ecologists.readthedocs.org/en/latest/day-1/#merge-paired-end-illumina-data

* nextflow
  http://www.nextflow.io/example4.html
  This one clearly allows grouping of files (e.g. read_pairs)

Any other rake/make-killers?

Criteria I think are important are: 
 * simple syntax (yaml?)
 * easy wild-carding syntax/DSL
      XXX.bam requires #{basename($_)}.sam
 * easy grouping of files (for paired reads; for samples split across multiple files)
 * easy error checking & failing
   - e.g. checking that output files are not empty
   - e.g. checking that files have same length (when appropriate)
   - e.g. checking return code or presence/absence of specific text in stdout or stderr

The additional killer would be amazing visual progress output & if it learnt how long specific times are likely to take to provide an ETA. 

Cheers,

Yannick

-------------------------------------------------------
Yannick Wurm - http://wurmlab.github.io
Ants, Genomes & Evolution ⋅ y.wurm at qmul.ac.uk ⋅ skype:yannickwurm ⋅ +44 207 882 3049
5.03A Fogg ⋅ School of Biological & Chemical Sciences ⋅ Queen Mary, University of London ⋅ Mile End Road ⋅ E1 4NS London ⋅ UK