[BioRuby] GSOC

Eric Talevich eric.talevich at gmail.com
Wed May 21 23:40:15 UTC 2014


On Mon, May 12, 2014 at 9:12 AM, Loris Cro <l.cro at campus.unimib.it> wrote:

> I'm trying to write a list of all the problems that must be addressed:
>
>
> https://github.com/kappaloris/GSoC-2014-OBF/blob/master/problems-features.md
>
> For now I believe I should try to fill the first section as much as
> possible and
> I wouldn't mind some input in that regard.
>
> I stubbed a possible data model that would preserve all the informations
> present in the VCF files, considering also the possibility of having
> multiple
> reference genomes inside a single collection.
>
> https://gist.github.com/kappaloris/462082314dc2e940ba4e
>
> How to merge the results of queries is still TBD, tho.
>
>
Did you look at GEMINI's data model yet?
https://github.com/arq5x/gemini/blob/master/gemini/database.py

This system is in active use so it should be able to cover a fair number of
real-world edge cases.

Also, since you're coding in Python here, have you considered using PyVCF
or the unmerged Biopython one that was written for a previous GSoC?
https://github.com/lennax/biopython/tree/variant2/Bio/Variant

If you make improvements in either of those to improve robustness, upstream
would probably appreciate your patches.



More information about the BioRuby mailing list