[GSoC] GSoC 2014 BioRuby

Francesco Strozzi francesco.strozzi at gmail.com
Tue Mar 18 13:34:54 UTC 2014


Hi Razvan,
the code looks fine. Please go ahead with the submission and try to upload
it before Friday so that we can have a look at it and comment it. The same
goes also for the other students interested in the topic.

All the best.
Francesco


On Tue, Mar 18, 2014 at 11:19 AM, Razvan Florea
<razvan.florea91 at gmail.com>wrote:

> Hi Francesco,
>
> Did you see my last email?
> I would appreciate if you will give me feedback because I started to write
> the proposal and I have some more questions.
>
> Best,
> Razvan
>
>
> 2014-03-17 13:12 GMT+01:00 Razvan Florea <razvan.florea91 at gmail.com>:
>
> Hi Francesco,
>>
>> I used your indications and I updated the basic wrapper [1].
>> Please take a look and tell me if it is what you were expecting from me
>> to do and if I should do something else.
>>
>> [1]: https://github.com/razvanflorea/picard-jruby-wrapper
>>
>> Thank you,
>> Razvan
>>
>>
>> 2014-03-17 10:14 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com
>> >:
>>
>> Hi Razvan,
>>> have a look at the org.broadinstitute.variant.vcf
>>> and org.broadinstitute.variant.variantcontext.VariantContext classes within
>>> the Picard API. Those are used to read from a VCF file, while to write a
>>> VCF you need to use also
>>> the org.broadinstitute.variant.variantcontext.writer .
>>>
>>> Hope this can help a bit, docs are not incredibly helpful here to point
>>> out what every library does and you need to dig a bit on Google as well :-)
>>>
>>> All the best.
>>> Francesco
>>>
>>>
>>>
>>> On Sat, Mar 15, 2014 at 11:16 PM, Razvan Florea <
>>> razvan.florea91 at gmail.com> wrote:
>>>
>>>> Hi Francesco,
>>>>
>>>> I am trying to make that wrapper for Picard as you recommend me.
>>>>  I created a repository on github at [1]. Right now in this repository
>>>> is a jruby simple script that uses a class from Picard that converts
>>>> between "vcf" and "bcf" files.
>>>>
>>>> I didn't find classes for retrieving SNPs from VCF files. Can you help
>>>> me please with some information about that?
>>>>
>>>> [1] https://github.com/razvanflorea/picard-jruby-wrapper
>>>>
>>>> Best,
>>>> Razvan
>>>>
>>>>
>>>> 2014-03-15 10:17 GMT+01:00 Francesco Strozzi <
>>>> francesco.strozzi at gmail.com>:
>>>>
>>>> Hi Razvan,
>>>>>
>>>>> 1) I think having a client would be nice of course but I would not
>>>>> consider it critical. Building a client around a REST API is pretty
>>>>> straight forward in any language.
>>>>>
>>>>> 2) Yes of course, look also at the Picard (
>>>>> http://picard.sourceforge.net/) library. This is the low level API to
>>>>> access VCF and other files and GATK relies heavily on this to fetch the
>>>>> data out of raw files.
>>>>>
>>>>> 3) If you have some code on GitHub or other repo that you would like
>>>>> to show us, that's fine. Otherwise you could spend a bit of time writing a
>>>>> simple JRuby wrapper for Picard, to access a VCF file and retrieve a list
>>>>> of SNPs. This could be like a pet project to start wrapping your head
>>>>> around these libraries, while spending also some time with JRuby as well.
>>>>>
>>>>> All the best.
>>>>> Francesco
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 14, 2014 at 6:50 PM, Razvan Florea <
>>>>> razvan.florea91 at gmail.com> wrote:
>>>>>
>>>>>> Hello Francesco,
>>>>>>
>>>>>> 1. The queries will be made through http requests (basically GET and
>>>>>> POST). But does the project consist also of making a client for the web
>>>>>> service?
>>>>>> 2. I think using the GATK framework is absolutely necessary because
>>>>>> even we will choose to use a database engine, the VCF files have to be
>>>>>> migrated to the database which I think can be made with this framework. Am
>>>>>> I right?
>>>>>> 3. Meanwhile, do you think I can contribute somehow to show my skills
>>>>>> and my willing to work on this project this summer?
>>>>>>
>>>>>> Best,
>>>>>> Razvan
>>>>>>
>>>>>>
>>>>>> 2014-03-14 14:43 GMT+01:00 Francesco Strozzi <
>>>>>> francesco.strozzi at gmail.com>:
>>>>>>
>>>>>> Hi Razvan,
>>>>>>> the general idea is to try and have an interface which lets you do
>>>>>>> queries on top of the data stored into VCF files.
>>>>>>> For example, as a typical scenario one could ask to retrieve all the
>>>>>>> variations which are exclusively present into 20 samples out of a dataset
>>>>>>> of 100 samples.
>>>>>>> An API could then expose a method which take a list of samples names
>>>>>>> plus other conditions and returns for instance a json with all the
>>>>>>> variations fulfilling the query.
>>>>>>>
>>>>>>> Whether a database engine is to be used or not it may depend on how
>>>>>>> you would like to implement the whole thing. One can also imagine not to
>>>>>>> store anything into a database and just access the data from the VCF files
>>>>>>> but providing a higher level interface. In this case I'd suggest to you and
>>>>>>> to other students interested in the topic to explore also the GATK
>>>>>>> framework (https://github.com/broadgsa/gatk,
>>>>>>> http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone)
>>>>>>> since it exposes a number of modules called walkers that should make the
>>>>>>> life easier in accessing and traversing VCF files.
>>>>>>>
>>>>>>> JRuby sounds about right, as you'll have the typical Ruby
>>>>>>> flexibility to quickly prototype new things while having the ability to
>>>>>>> include Java code (GATK is written in Java and Scala BTW).
>>>>>>>
>>>>>>> Cheers
>>>>>>> Francesco
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 13, 2014 at 9:58 AM, Razvan Florea <
>>>>>>> razvan.florea91 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> My name is Razvan Florea and am studying Computing Science at
>>>>>>>> University of
>>>>>>>> Groningen, Netherlands.
>>>>>>>> I am writing this to show my interest for the BioRuby gsoc project:
>>>>>>>> "An
>>>>>>>> ultra-fast scalable RESTful API to query large numbers of genomic
>>>>>>>> variations".  Currently I am doing my bachelor thesis project which
>>>>>>>> is also
>>>>>>>> about developing a RESTful API.
>>>>>>>>
>>>>>>>> As Francesco recommand me I took a look on the links there are in
>>>>>>>> the
>>>>>>>> proposal text and at the proposal itself and so far I understood
>>>>>>>> that the
>>>>>>>> basic idea of the project is to replace the manipulation of
>>>>>>>> information
>>>>>>>> from VCF files with manipulation of information from a database
>>>>>>>> which will
>>>>>>>> reside on an web service. Am I right?
>>>>>>>> If yes, what do you expect from the API to be capable to do?
>>>>>>>> Retrieving
>>>>>>>> "json"s with information is ok? Or is more than that?
>>>>>>>>
>>>>>>>> Also, Rails over JRuby could be a good choice of technology for
>>>>>>>> developing
>>>>>>>> the web service?
>>>>>>>>
>>>>>>>> Please give me any information you think it could be helpful for me.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Razvan
>>>>>>>> _______________________________________________
>>>>>>>> GSoC mailing list
>>>>>>>> GSoC at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Francesco Strozzi
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Francesco Strozzi
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Francesco Strozzi
>>>
>>
>>
>


-- 

Francesco Strozzi



More information about the GSoC mailing list