[GSoC] GSoC 2014 BioRuby

Razvan Florea razvan.florea91 at gmail.com
Tue Mar 18 10:19:28 UTC 2014


Hi Francesco,

Did you see my last email?
I would appreciate if you will give me feedback because I started to write
the proposal and I have some more questions.

Best,
Razvan


2014-03-17 13:12 GMT+01:00 Razvan Florea <razvan.florea91 at gmail.com>:

> Hi Francesco,
>
> I used your indications and I updated the basic wrapper [1].
> Please take a look and tell me if it is what you were expecting from me to
> do and if I should do something else.
>
> [1]: https://github.com/razvanflorea/picard-jruby-wrapper
>
> Thank you,
> Razvan
>
>
> 2014-03-17 10:14 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com>
> :
>
> Hi Razvan,
>> have a look at the org.broadinstitute.variant.vcf
>> and org.broadinstitute.variant.variantcontext.VariantContext classes within
>> the Picard API. Those are used to read from a VCF file, while to write a
>> VCF you need to use also
>> the org.broadinstitute.variant.variantcontext.writer .
>>
>> Hope this can help a bit, docs are not incredibly helpful here to point
>> out what every library does and you need to dig a bit on Google as well :-)
>>
>> All the best.
>> Francesco
>>
>>
>>
>> On Sat, Mar 15, 2014 at 11:16 PM, Razvan Florea <
>> razvan.florea91 at gmail.com> wrote:
>>
>>> Hi Francesco,
>>>
>>> I am trying to make that wrapper for Picard as you recommend me.
>>>  I created a repository on github at [1]. Right now in this repository
>>> is a jruby simple script that uses a class from Picard that converts
>>> between "vcf" and "bcf" files.
>>>
>>> I didn't find classes for retrieving SNPs from VCF files. Can you help
>>> me please with some information about that?
>>>
>>> [1] https://github.com/razvanflorea/picard-jruby-wrapper
>>>
>>> Best,
>>> Razvan
>>>
>>>
>>> 2014-03-15 10:17 GMT+01:00 Francesco Strozzi <
>>> francesco.strozzi at gmail.com>:
>>>
>>> Hi Razvan,
>>>>
>>>> 1) I think having a client would be nice of course but I would not
>>>> consider it critical. Building a client around a REST API is pretty
>>>> straight forward in any language.
>>>>
>>>> 2) Yes of course, look also at the Picard (
>>>> http://picard.sourceforge.net/) library. This is the low level API to
>>>> access VCF and other files and GATK relies heavily on this to fetch the
>>>> data out of raw files.
>>>>
>>>> 3) If you have some code on GitHub or other repo that you would like to
>>>> show us, that's fine. Otherwise you could spend a bit of time writing a
>>>> simple JRuby wrapper for Picard, to access a VCF file and retrieve a list
>>>> of SNPs. This could be like a pet project to start wrapping your head
>>>> around these libraries, while spending also some time with JRuby as well.
>>>>
>>>> All the best.
>>>> Francesco
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Mar 14, 2014 at 6:50 PM, Razvan Florea <
>>>> razvan.florea91 at gmail.com> wrote:
>>>>
>>>>> Hello Francesco,
>>>>>
>>>>> 1. The queries will be made through http requests (basically GET and
>>>>> POST). But does the project consist also of making a client for the web
>>>>> service?
>>>>> 2. I think using the GATK framework is absolutely necessary because
>>>>> even we will choose to use a database engine, the VCF files have to be
>>>>> migrated to the database which I think can be made with this framework. Am
>>>>> I right?
>>>>> 3. Meanwhile, do you think I can contribute somehow to show my skills
>>>>> and my willing to work on this project this summer?
>>>>>
>>>>> Best,
>>>>> Razvan
>>>>>
>>>>>
>>>>> 2014-03-14 14:43 GMT+01:00 Francesco Strozzi <
>>>>> francesco.strozzi at gmail.com>:
>>>>>
>>>>> Hi Razvan,
>>>>>> the general idea is to try and have an interface which lets you do
>>>>>> queries on top of the data stored into VCF files.
>>>>>> For example, as a typical scenario one could ask to retrieve all the
>>>>>> variations which are exclusively present into 20 samples out of a dataset
>>>>>> of 100 samples.
>>>>>> An API could then expose a method which take a list of samples names
>>>>>> plus other conditions and returns for instance a json with all the
>>>>>> variations fulfilling the query.
>>>>>>
>>>>>> Whether a database engine is to be used or not it may depend on how
>>>>>> you would like to implement the whole thing. One can also imagine not to
>>>>>> store anything into a database and just access the data from the VCF files
>>>>>> but providing a higher level interface. In this case I'd suggest to you and
>>>>>> to other students interested in the topic to explore also the GATK
>>>>>> framework (https://github.com/broadgsa/gatk,
>>>>>> http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone)
>>>>>> since it exposes a number of modules called walkers that should make the
>>>>>> life easier in accessing and traversing VCF files.
>>>>>>
>>>>>> JRuby sounds about right, as you'll have the typical Ruby flexibility
>>>>>> to quickly prototype new things while having the ability to include Java
>>>>>> code (GATK is written in Java and Scala BTW).
>>>>>>
>>>>>> Cheers
>>>>>> Francesco
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 13, 2014 at 9:58 AM, Razvan Florea <
>>>>>> razvan.florea91 at gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> My name is Razvan Florea and am studying Computing Science at
>>>>>>> University of
>>>>>>> Groningen, Netherlands.
>>>>>>> I am writing this to show my interest for the BioRuby gsoc project:
>>>>>>> "An
>>>>>>> ultra-fast scalable RESTful API to query large numbers of genomic
>>>>>>> variations".  Currently I am doing my bachelor thesis project which
>>>>>>> is also
>>>>>>> about developing a RESTful API.
>>>>>>>
>>>>>>> As Francesco recommand me I took a look on the links there are in the
>>>>>>> proposal text and at the proposal itself and so far I understood
>>>>>>> that the
>>>>>>> basic idea of the project is to replace the manipulation of
>>>>>>> information
>>>>>>> from VCF files with manipulation of information from a database
>>>>>>> which will
>>>>>>> reside on an web service. Am I right?
>>>>>>> If yes, what do you expect from the API to be capable to do?
>>>>>>> Retrieving
>>>>>>> "json"s with information is ok? Or is more than that?
>>>>>>>
>>>>>>> Also, Rails over JRuby could be a good choice of technology for
>>>>>>> developing
>>>>>>> the web service?
>>>>>>>
>>>>>>> Please give me any information you think it could be helpful for me.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Razvan
>>>>>>> _______________________________________________
>>>>>>> GSoC mailing list
>>>>>>> GSoC at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Francesco Strozzi
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Francesco Strozzi
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Francesco Strozzi
>>
>
>



More information about the GSoC mailing list