[GSoC] Bionode project at GSOC 2014

Tue Mar 11 20:35:15 UTC 2014

I would not look too closely at existing implementations of BioPython,
BioRuby and BioPerl, unless the authors take strong pride in their
solution. In general, I find either implementations are too simple
(all data is handled in RAM) or too complicated - using fancyful but
complicated adapters, such as seqio. What you can pick up there is the
general idea of an API, but make sure to implement in a Node.js way -
so other JS programmers feel comfortable and know how to adapt that code.

My simple rules: KISS. Choose functional programming over OOP. Choose
iterators over in-memory loading. Avoid state in objects.

If you want to look at IUpac handling etc., I have been mostly
impressed by the BioJava implementations. I wrote BioScala, after much
experience with BioRuby and others, and every good idea I came up with
was already in BioJava :). I think because Java is such hard work, people
tend to think harder before they write code, but that is just a
hypothesis. Downside, JAVA stuff tends to be OOP heavy. I would also
look at Biohaskell - I bet they do a lot of things right. On a case by
case basis it may be interesting to look at Ruby biogems. These tend
to be lightweight and contain recent designs.  I can help point out
the better designed ones on a case by case basis.

My advice is to work from scratch taking cues from the BioJS guys. Be
good to have full JSON support on input and output so other Bio*
projects may benefit. Read my small tools manifesto

  https://github.com/pjotrp/bioinformatics

At the same time, it would be good if partial functionality can also
be used directly in the browser. So whatever is useful there should be
shared with a bionode implementation.

That is what I would do.

Pj.

On Tue, Mar 11, 2014 at 08:12:16PM +0000, Bruno Vieira wrote:
> Hi Chris,
> 
> Thanks for the interest in bionode!
> 
> Regarding the requirements for the project plan, you can get more
> information at:
> 
> http://www.open-bio.org/wiki/Google_Summer_of_Code#When_you_apply
> 
> but maybe someone from the organization can provide more details here?
> 
> Regarding bionode, what we intend to start building is some core
> functionality by starting with methods for parsing/writing bioinformatic
> file formats. Once we have that, we can start implementing
> algorithms/methods for alignments, annotation, etc.
> 
> Regarding your points:
> 
> 1) You can also check sites like biostars.org or seqanswers.com to get a
> feel of what the community is doing but it is probably better to avoid
> spending too much time on those sites at the beginning since bioinformatics
> is a very broad term and it is easy to feel overwhelmed if you are not
> looking for something specific. If you start by implementing an API to
> handle file formats like FASTA, BAM, VCF, bigWig, GFF, etc., it could
> benefit many visualizations and genome browsers. Complementary to looking
> at other Bio* like biopython.SeqIO, you could also look at JavaScript
> genome browsers, like biodalliance.org, that can already handle some of
> those formats and see if the code can be reused/improved in bionode to be
> later used by other genome browsers or BioJS components. When doing IO, we
> could take advantage of Node.js Streams on the server side and maybe on the
> client side (maxogden/domnode <http://github.com/maxogden/domnode>;
> substack/stream-browserify <http://github.com/substack/stream-browserify>).
> 
> Once we have some basic file formats handling we should prioritise
> algorithms that could benefit visualization projects like BioJS/Afra/Genome
> browsers while also being useful server side. For example, an alignment
> algorithm could be useful on the browser to interactively align and
> visualize a small number of sequences, while the same code could be reused
> on the server with heavier datasets. We could also consider compiling some
> existing C/C++ algorithms to JavaScript using
> Emscripten<http://emscripten.org>,
> although the decrease in performance might not make this approach
> desirable. However, emscripten could be interesting for at least two
> reasons: 1. Running those algorithms on the server without
> installing/compiling anything else beside Node.js; 2. Run them client side
> on a browser powered distributed computing network
> (turn/queen<http://github.com/turn/queen>or
> CrowdProcess <http://crowdprocess.com>).
> 
> 2) BrowserStack looks awesome.
> 
> 3, 4 and 6) Agree.
> 
> 5) Bionode development will be in collaboration with BioJS.
> 
> Opinions?
> 
> Cheers,
> Bruno
> 
> P.S. If you or anyone wants to chat, you can add bmpvieira to Skype/Hangout.
> 
> 
> On Mon, Mar 10, 2014 at 8:21 PM, Christoph Neuroth <
> christoph.neuroth at gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I am interested in working on the bionode project for GSOC and would
> > like to shortly introduce myself and ask for any thoughts on my
> > proposal.
> >
> > My name is Chris, or c089 on both Github and Twitter. I have a B.Sc.
> > in computer science from Munich University of Applied Sciences and
> > plan to start a master's degree at University of Leipzig in April (not
> > yet enrolled, but will be on April 21, which the FAQ states as the
> > decisive date, if nothing goes wrong with the paperwork). This masters
> > course allows to specialize in either bioinformatics or medical
> > informatics and while I've not yet decided on which to chose I would
> > like to use GSOC to get a good understanding of bioinformatics as well
> > as contribute to open source. I don't have an existing bioinformatics
> > background, but currently supplement my high school biology knowledge
> > by taking a genetics online course and will probably take both the
> > bioinformatics class at uni and the Coursera one mentioned on the
> > project idea page to acquire the required knowledge during the GSOC
> > period.
> >
> > As for my open source background, I've been contributing to various
> > open source projects for years. I am currently a maintainer of
> > rendrjs, which might be relevant for the bionode project because it is
> > a library that focuses on rendering Backbone.js apps on the client and
> > server from the same code.
> >
> > For my project plan I would really like some input and especially
> > don't know how much detail up-front is required for a successful
> > proposal, but the following possible tasks came to mind after reading
> > the project rationale:
> >
> > 1) Learn about the needs of the bioinformatics community. This should
> > be achievable by enhancing knowledge on the domain itself, discussions
> > with mentors and evaluating strengths and weaknesses of the
> > implementations available for other languages by using them and
> > reading about them on the mailing lists
> > 2) Extend the existing project infrastructure to run the test suite in
> > targeted browsers as well as node.js environment using the karma
> > runner and Browserstack or SauceLabs cloud services for cross-browser
> > infrastructure
> > 3) Identify good JavaScript API design for the requirements identified
> > earlier and implement new features in bionode using test-driven
> > development. If possible and desired, I might also work on
> > cross-implementation test suites for the different language projects.
> > 4) detect duplications/overlaps of code in BioJS and the other JS
> > libraries mentioned on the wiki and extract those to bionode
> > 5) If there's time left for side-projects, I'm also interested in
> > contributing a bit to the visualization projects listed on the
> > projects page of the BioJS project to see how well the APIs in bionode
> > work.
> > 6) Given a successful project, I also plan on presenting the project
> > at node.js/JavaScript meetups and conferences at the end of the summer
> > as well.
> >
> > Looking forward to any feedback on these thoughts!
> > _______________________________________________
> > GSoC mailing list
> > GSoC at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/gsoc
> _______________________________________________
> GSoC mailing list
> GSoC at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/gsoc