[GSoC] GSoC 2014 BioRuby

Razvan Florea razvan.florea91 at gmail.com
Fri Mar 14 17:50:38 UTC 2014


Hello Francesco,

1. The queries will be made through http requests (basically GET and POST).
But does the project consist also of making a client for the web service?
2. I think using the GATK framework is absolutely necessary because even we
will choose to use a database engine, the VCF files have to be migrated to
the database which I think can be made with this framework. Am I right?
3. Meanwhile, do you think I can contribute somehow to show my skills and
my willing to work on this project this summer?

Best,
Razvan


2014-03-14 14:43 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com>:

> Hi Razvan,
> the general idea is to try and have an interface which lets you do queries
> on top of the data stored into VCF files.
> For example, as a typical scenario one could ask to retrieve all the
> variations which are exclusively present into 20 samples out of a dataset
> of 100 samples.
> An API could then expose a method which take a list of samples names plus
> other conditions and returns for instance a json with all the variations
> fulfilling the query.
>
> Whether a database engine is to be used or not it may depend on how you
> would like to implement the whole thing. One can also imagine not to store
> anything into a database and just access the data from the VCF files but
> providing a higher level interface. In this case I'd suggest to you and to
> other students interested in the topic to explore also the GATK framework (
> https://github.com/broadgsa/gatk,
> http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone) since
> it exposes a number of modules called walkers that should make the life
> easier in accessing and traversing VCF files.
>
> JRuby sounds about right, as you'll have the typical Ruby flexibility to
> quickly prototype new things while having the ability to include Java code
> (GATK is written in Java and Scala BTW).
>
> Cheers
> Francesco
>
>
> On Thu, Mar 13, 2014 at 9:58 AM, Razvan Florea <razvan.florea91 at gmail.com>wrote:
>
>> Hello,
>>
>> My name is Razvan Florea and am studying Computing Science at University
>> of
>> Groningen, Netherlands.
>> I am writing this to show my interest for the BioRuby gsoc project: "An
>> ultra-fast scalable RESTful API to query large numbers of genomic
>> variations".  Currently I am doing my bachelor thesis project which is
>> also
>> about developing a RESTful API.
>>
>> As Francesco recommand me I took a look on the links there are in the
>> proposal text and at the proposal itself and so far I understood that the
>> basic idea of the project is to replace the manipulation of information
>> from VCF files with manipulation of information from a database which will
>> reside on an web service. Am I right?
>> If yes, what do you expect from the API to be capable to do? Retrieving
>> "json"s with information is ok? Or is more than that?
>>
>> Also, Rails over JRuby could be a good choice of technology for developing
>> the web service?
>>
>> Please give me any information you think it could be helpful for me.
>>
>> Thank you,
>> Razvan
>> _______________________________________________
>> GSoC mailing list
>> GSoC at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>
>
>
>
> --
>
> Francesco Strozzi
>



More information about the GSoC mailing list