[GSoC] Bionode project idea

Mon Feb 15 14:54:36 UTC 2016

Hi Bruno,

thanks for the proposal, I'll make sure I'll add it to our ideas page.

Cheers,
Kai

--
Kai Blin                                           kblin at biosustain.dtu.dk
PostDoc / Scientific Software Engineer
DTU Biosustain                                     http://www.biosustain.dtu.dk/
Building Thujahuset, room 2.L.09
DK - 2970 Hørsholm
Denmark
mobile: +45 93511306                               twitter: @kaiblin
________________________________
From: GSoC [gsoc-bounces+kblin=biosustain.dtu.dk at mailman.open-bio.org] on behalf of Bruno Vieira [mail at bmpvieira.com]
Sent: 15 February 2016 15:35
To: gsoc at mailman.open-bio.org
Cc: Yannick Wurm; Max Ogden; Mathias Buus Madsen
Subject: Re: [GSoC] Bionode project idea

Here's the proposal:

Bionode workflow engine for streamed data analysis

Researchers should be able to:
  * Perform analyses while data are generated (i.e., with “data streams”);
  * Easily and rapidly update results if input data or analysis approaches/parameters change (with minimal recomputation);
  * Effortlessly change and scale underlying computing platforms while pipeline is running;
  * Easily visualise results.

This is largely impossible because current approaches were developed when datasets were simpler and smaller. The student will take advantage of recent improvements in generic analysis tools (Node.js Streams & asynchronous concurrency) to attain the above objectives.

The student will create a workflow engine for streamed data analysis with concurrent pipelining. The main mentors will be Max Ogden and Mathias Buus, top Node.js contributors and founders of Dat-data.com, for their experience with streaming interfaces. Bruno Vieira (founder of Bionode.io) and Yannick Wurm (lecturer in Bioinformatics at QMUL) will co-supervise.

Some work on the data structures and programming interfaces for commonly used data sources (e.g., NCBI, Uniprot, Ensembl/Biomart) and data types (e.g., VCF, BAM, FASTQ) will be required.

The underlying computational architecture architecture should be abstracted. This means that analysis code will run identically using different traditional high performance computing system (e.g., Torque, SGE) and modern systems (e.g., Hadoop MapReduce).

Some components and proof of concepts required for this project are available at http://github.com/bionode

JavaScript skills are required. Node.js and some biology knowledge is a plus. Difficulty is medium.

Cheers,
Bruno

On Mon, Feb 15, 2016 at 1:35 PM Bruno Vieira <mail at bmpvieira.com<mailto:mail at bmpvieira.com>> wrote:
Hi all,

Would it be possible to propose an idea for the Bionode.io project through OBF?
If so, please let me know the proper process to submit, since I saw in another thread that you're having issues with the wiki.

Cheers,
Bruno
bmpvieira.com<http://bmpvieira.com>    bionode.io<http://bionode.io>    wurmlab.github.io<http://wurmlab.github.io>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/gsoc/attachments/20160215/2c822d8e/attachment.html>