[GSoC] Bionode project at GSOC 2014

Tue Mar 11 20:12:16 UTC 2014

Hi Chris,

Thanks for the interest in bionode!

Regarding the requirements for the project plan, you can get more
information at:

http://www.open-bio.org/wiki/Google_Summer_of_Code#When_you_apply

but maybe someone from the organization can provide more details here?

Regarding bionode, what we intend to start building is some core
functionality by starting with methods for parsing/writing bioinformatic
file formats. Once we have that, we can start implementing
algorithms/methods for alignments, annotation, etc.

Regarding your points:

1) You can also check sites like biostars.org or seqanswers.com to get a
feel of what the community is doing but it is probably better to avoid
spending too much time on those sites at the beginning since bioinformatics
is a very broad term and it is easy to feel overwhelmed if you are not
looking for something specific. If you start by implementing an API to
handle file formats like FASTA, BAM, VCF, bigWig, GFF, etc., it could
benefit many visualizations and genome browsers. Complementary to looking
at other Bio* like biopython.SeqIO, you could also look at JavaScript
genome browsers, like biodalliance.org, that can already handle some of
those formats and see if the code can be reused/improved in bionode to be
later used by other genome browsers or BioJS components. When doing IO, we
could take advantage of Node.js Streams on the server side and maybe on the
client side (maxogden/domnode <http://github.com/maxogden/domnode>;
substack/stream-browserify <http://github.com/substack/stream-browserify>).

Once we have some basic file formats handling we should prioritise
algorithms that could benefit visualization projects like BioJS/Afra/Genome
browsers while also being useful server side. For example, an alignment
algorithm could be useful on the browser to interactively align and
visualize a small number of sequences, while the same code could be reused
on the server with heavier datasets. We could also consider compiling some
existing C/C++ algorithms to JavaScript using
Emscripten<http://emscripten.org>,
although the decrease in performance might not make this approach
desirable. However, emscripten could be interesting for at least two
reasons: 1. Running those algorithms on the server without
installing/compiling anything else beside Node.js; 2. Run them client side
on a browser powered distributed computing network
(turn/queen<http://github.com/turn/queen>or
CrowdProcess <http://crowdprocess.com>).

2) BrowserStack looks awesome.

3, 4 and 6) Agree.

5) Bionode development will be in collaboration with BioJS.

Opinions?

Cheers,
Bruno

P.S. If you or anyone wants to chat, you can add bmpvieira to Skype/Hangout.

On Mon, Mar 10, 2014 at 8:21 PM, Christoph Neuroth <
christoph.neuroth at gmail.com> wrote:
>
> Hi everyone,
>
> I am interested in working on the bionode project for GSOC and would
> like to shortly introduce myself and ask for any thoughts on my
> proposal.
>
> My name is Chris, or c089 on both Github and Twitter. I have a B.Sc.
> in computer science from Munich University of Applied Sciences and
> plan to start a master's degree at University of Leipzig in April (not
> yet enrolled, but will be on April 21, which the FAQ states as the
> decisive date, if nothing goes wrong with the paperwork). This masters
> course allows to specialize in either bioinformatics or medical
> informatics and while I've not yet decided on which to chose I would
> like to use GSOC to get a good understanding of bioinformatics as well
> as contribute to open source. I don't have an existing bioinformatics
> background, but currently supplement my high school biology knowledge
> by taking a genetics online course and will probably take both the
> bioinformatics class at uni and the Coursera one mentioned on the
> project idea page to acquire the required knowledge during the GSOC
> period.
>
> As for my open source background, I've been contributing to various
> open source projects for years. I am currently a maintainer of
> rendrjs, which might be relevant for the bionode project because it is
> a library that focuses on rendering Backbone.js apps on the client and
> server from the same code.
>
> For my project plan I would really like some input and especially
> don't know how much detail up-front is required for a successful
> proposal, but the following possible tasks came to mind after reading
> the project rationale:
>
> 1) Learn about the needs of the bioinformatics community. This should
> be achievable by enhancing knowledge on the domain itself, discussions
> with mentors and evaluating strengths and weaknesses of the
> implementations available for other languages by using them and
> reading about them on the mailing lists
> 2) Extend the existing project infrastructure to run the test suite in
> targeted browsers as well as node.js environment using the karma
> runner and Browserstack or SauceLabs cloud services for cross-browser
> infrastructure
> 3) Identify good JavaScript API design for the requirements identified
> earlier and implement new features in bionode using test-driven
> development. If possible and desired, I might also work on
> cross-implementation test suites for the different language projects.
> 4) detect duplications/overlaps of code in BioJS and the other JS
> libraries mentioned on the wiki and extract those to bionode
> 5) If there's time left for side-projects, I'm also interested in
> contributing a bit to the visualization projects listed on the
> projects page of the BioJS project to see how well the APIs in bionode
> work.
> 6) Given a successful project, I also plan on presenting the project
> at node.js/JavaScript meetups and conferences at the end of the summer
> as well.
>
> Looking forward to any feedback on these thoughts!
> _______________________________________________
> GSoC mailing list
> GSoC at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/gsoc