[GSoC] GSoC 2014 BioRuby

Francesco Strozzi francesco.strozzi at gmail.com
Fri Mar 14 13:43:09 UTC 2014


Hi Razvan,
the general idea is to try and have an interface which lets you do queries
on top of the data stored into VCF files.
For example, as a typical scenario one could ask to retrieve all the
variations which are exclusively present into 20 samples out of a dataset
of 100 samples.
An API could then expose a method which take a list of samples names plus
other conditions and returns for instance a json with all the variations
fulfilling the query.

Whether a database engine is to be used or not it may depend on how you
would like to implement the whole thing. One can also imagine not to store
anything into a database and just access the data from the VCF files but
providing a higher level interface. In this case I'd suggest to you and to
other students interested in the topic to explore also the GATK framework (
https://github.com/broadgsa/gatk,
http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone) since
it exposes a number of modules called walkers that should make the life
easier in accessing and traversing VCF files.

JRuby sounds about right, as you'll have the typical Ruby flexibility to
quickly prototype new things while having the ability to include Java code
(GATK is written in Java and Scala BTW).

Cheers
Francesco


On Thu, Mar 13, 2014 at 9:58 AM, Razvan Florea <razvan.florea91 at gmail.com>wrote:

> Hello,
>
> My name is Razvan Florea and am studying Computing Science at University of
> Groningen, Netherlands.
> I am writing this to show my interest for the BioRuby gsoc project: "An
> ultra-fast scalable RESTful API to query large numbers of genomic
> variations".  Currently I am doing my bachelor thesis project which is also
> about developing a RESTful API.
>
> As Francesco recommand me I took a look on the links there are in the
> proposal text and at the proposal itself and so far I understood that the
> basic idea of the project is to replace the manipulation of information
> from VCF files with manipulation of information from a database which will
> reside on an web service. Am I right?
> If yes, what do you expect from the API to be capable to do? Retrieving
> "json"s with information is ok? Or is more than that?
>
> Also, Rails over JRuby could be a good choice of technology for developing
> the web service?
>
> Please give me any information you think it could be helpful for me.
>
> Thank you,
> Razvan
> _______________________________________________
> GSoC mailing list
> GSoC at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/gsoc
>



-- 

Francesco Strozzi



More information about the GSoC mailing list