[BioRuby] GSOC

Pjotr Prins pjotr.public14 at thebird.nl
Mon May 12 18:20:22 UTC 2014


On Mon, May 12, 2014 at 06:12:29PM +0200, Loris Cro wrote:
> I stubbed a possible data model that would preserve all the informations 
> present in the VCF files, considering also the possibility of having multiple 
> reference genomes inside a single collection. 
> 
> https://gist.github.com/kappaloris/462082314dc2e940ba4e
> 
> How to merge the results of queries is still TBD, tho.

I think it is better to stick to storing data in a row wise fashion (by
variant, SNP, record).

Queries are typically row based. Speedy parallel processing will be possible
when all rows are independent of each other.

Only the header should go in a separate block.

I don't think you should be concerned with multiple reference genomes
at this point. Just skip that for now. I think that information does
not have to go in the storage. Unless you start storing reference
genomes themselves.

Pj.




More information about the BioRuby mailing list