[GSoC] GSoC 2014 BioRuby

Razvan Florea razvan.florea91 at gmail.com
Sat Mar 15 22:16:50 UTC 2014


Hi Francesco,

I am trying to make that wrapper for Picard as you recommend me.
I created a repository on github at [1]. Right now in this repository is a
jruby simple script that uses a class from Picard that converts between
"vcf" and "bcf" files.

I didn't find classes for retrieving SNPs from VCF files. Can you help me
please with some information about that?

[1] https://github.com/razvanflorea/picard-jruby-wrapper

Best,
Razvan


2014-03-15 10:17 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com>:

> Hi Razvan,
>
> 1) I think having a client would be nice of course but I would not
> consider it critical. Building a client around a REST API is pretty
> straight forward in any language.
>
> 2) Yes of course, look also at the Picard (http://picard.sourceforge.net/)
> library. This is the low level API to access VCF and other files and GATK
> relies heavily on this to fetch the data out of raw files.
>
> 3) If you have some code on GitHub or other repo that you would like to
> show us, that's fine. Otherwise you could spend a bit of time writing a
> simple JRuby wrapper for Picard, to access a VCF file and retrieve a list
> of SNPs. This could be like a pet project to start wrapping your head
> around these libraries, while spending also some time with JRuby as well.
>
> All the best.
> Francesco
>
>
>
>
> On Fri, Mar 14, 2014 at 6:50 PM, Razvan Florea <razvan.florea91 at gmail.com>wrote:
>
>> Hello Francesco,
>>
>> 1. The queries will be made through http requests (basically GET and
>> POST). But does the project consist also of making a client for the web
>> service?
>> 2. I think using the GATK framework is absolutely necessary because even
>> we will choose to use a database engine, the VCF files have to be migrated
>> to the database which I think can be made with this framework. Am I right?
>> 3. Meanwhile, do you think I can contribute somehow to show my skills and
>> my willing to work on this project this summer?
>>
>> Best,
>> Razvan
>>
>>
>> 2014-03-14 14:43 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com
>> >:
>>
>> Hi Razvan,
>>> the general idea is to try and have an interface which lets you do
>>> queries on top of the data stored into VCF files.
>>> For example, as a typical scenario one could ask to retrieve all the
>>> variations which are exclusively present into 20 samples out of a dataset
>>> of 100 samples.
>>> An API could then expose a method which take a list of samples names
>>> plus other conditions and returns for instance a json with all the
>>> variations fulfilling the query.
>>>
>>> Whether a database engine is to be used or not it may depend on how you
>>> would like to implement the whole thing. One can also imagine not to store
>>> anything into a database and just access the data from the VCF files but
>>> providing a higher level interface. In this case I'd suggest to you and to
>>> other students interested in the topic to explore also the GATK framework (
>>> https://github.com/broadgsa/gatk,
>>> http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone)
>>> since it exposes a number of modules called walkers that should make the
>>> life easier in accessing and traversing VCF files.
>>>
>>> JRuby sounds about right, as you'll have the typical Ruby flexibility to
>>> quickly prototype new things while having the ability to include Java code
>>> (GATK is written in Java and Scala BTW).
>>>
>>> Cheers
>>> Francesco
>>>
>>>
>>> On Thu, Mar 13, 2014 at 9:58 AM, Razvan Florea <
>>> razvan.florea91 at gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> My name is Razvan Florea and am studying Computing Science at
>>>> University of
>>>> Groningen, Netherlands.
>>>> I am writing this to show my interest for the BioRuby gsoc project: "An
>>>> ultra-fast scalable RESTful API to query large numbers of genomic
>>>> variations".  Currently I am doing my bachelor thesis project which is
>>>> also
>>>> about developing a RESTful API.
>>>>
>>>> As Francesco recommand me I took a look on the links there are in the
>>>> proposal text and at the proposal itself and so far I understood that
>>>> the
>>>> basic idea of the project is to replace the manipulation of information
>>>> from VCF files with manipulation of information from a database which
>>>> will
>>>> reside on an web service. Am I right?
>>>> If yes, what do you expect from the API to be capable to do? Retrieving
>>>> "json"s with information is ok? Or is more than that?
>>>>
>>>> Also, Rails over JRuby could be a good choice of technology for
>>>> developing
>>>> the web service?
>>>>
>>>> Please give me any information you think it could be helpful for me.
>>>>
>>>> Thank you,
>>>> Razvan
>>>> _______________________________________________
>>>> GSoC mailing list
>>>> GSoC at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Francesco Strozzi
>>>
>>
>>
>
>
> --
>
> Francesco Strozzi
>



More information about the GSoC mailing list