[GSoC] GSoC 2014 BioRuby

Francesco Strozzi francesco.strozzi at gmail.com
Sat Mar 15 09:17:10 UTC 2014


Hi Razvan,

1) I think having a client would be nice of course but I would not consider
it critical. Building a client around a REST API is pretty straight forward
in any language.

2) Yes of course, look also at the Picard (http://picard.sourceforge.net/)
library. This is the low level API to access VCF and other files and GATK
relies heavily on this to fetch the data out of raw files.

3) If you have some code on GitHub or other repo that you would like to
show us, that's fine. Otherwise you could spend a bit of time writing a
simple JRuby wrapper for Picard, to access a VCF file and retrieve a list
of SNPs. This could be like a pet project to start wrapping your head
around these libraries, while spending also some time with JRuby as well.

All the best.
Francesco




On Fri, Mar 14, 2014 at 6:50 PM, Razvan Florea <razvan.florea91 at gmail.com>wrote:

> Hello Francesco,
>
> 1. The queries will be made through http requests (basically GET and
> POST). But does the project consist also of making a client for the web
> service?
> 2. I think using the GATK framework is absolutely necessary because even
> we will choose to use a database engine, the VCF files have to be migrated
> to the database which I think can be made with this framework. Am I right?
> 3. Meanwhile, do you think I can contribute somehow to show my skills and
> my willing to work on this project this summer?
>
> Best,
> Razvan
>
>
> 2014-03-14 14:43 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com>
> :
>
> Hi Razvan,
>> the general idea is to try and have an interface which lets you do
>> queries on top of the data stored into VCF files.
>> For example, as a typical scenario one could ask to retrieve all the
>> variations which are exclusively present into 20 samples out of a dataset
>> of 100 samples.
>> An API could then expose a method which take a list of samples names plus
>> other conditions and returns for instance a json with all the variations
>> fulfilling the query.
>>
>> Whether a database engine is to be used or not it may depend on how you
>> would like to implement the whole thing. One can also imagine not to store
>> anything into a database and just access the data from the VCF files but
>> providing a higher level interface. In this case I'd suggest to you and to
>> other students interested in the topic to explore also the GATK framework (
>> https://github.com/broadgsa/gatk,
>> http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone)
>> since it exposes a number of modules called walkers that should make the
>> life easier in accessing and traversing VCF files.
>>
>> JRuby sounds about right, as you'll have the typical Ruby flexibility to
>> quickly prototype new things while having the ability to include Java code
>> (GATK is written in Java and Scala BTW).
>>
>> Cheers
>> Francesco
>>
>>
>> On Thu, Mar 13, 2014 at 9:58 AM, Razvan Florea <razvan.florea91 at gmail.com
>> > wrote:
>>
>>> Hello,
>>>
>>> My name is Razvan Florea and am studying Computing Science at University
>>> of
>>> Groningen, Netherlands.
>>> I am writing this to show my interest for the BioRuby gsoc project: "An
>>> ultra-fast scalable RESTful API to query large numbers of genomic
>>> variations".  Currently I am doing my bachelor thesis project which is
>>> also
>>> about developing a RESTful API.
>>>
>>> As Francesco recommand me I took a look on the links there are in the
>>> proposal text and at the proposal itself and so far I understood that the
>>> basic idea of the project is to replace the manipulation of information
>>> from VCF files with manipulation of information from a database which
>>> will
>>> reside on an web service. Am I right?
>>> If yes, what do you expect from the API to be capable to do? Retrieving
>>> "json"s with information is ok? Or is more than that?
>>>
>>> Also, Rails over JRuby could be a good choice of technology for
>>> developing
>>> the web service?
>>>
>>> Please give me any information you think it could be helpful for me.
>>>
>>> Thank you,
>>> Razvan
>>> _______________________________________________
>>> GSoC mailing list
>>> GSoC at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>>
>>
>>
>>
>> --
>>
>> Francesco Strozzi
>>
>
>


-- 

Francesco Strozzi



More information about the GSoC mailing list