[GSoC] GSoC 2014 queries and inputs

Fields, Christopher J cjfields at illinois.edu
Thu Mar 20 02:38:09 UTC 2014


You probably need to look this over first:

http://www.open-bio.org/wiki/Google_Summer_of_Code#Before_you_apply
http://www.open-bio.org/wiki/Google_Summer_of_Code#When_you_apply

Then you would go here:

http://www.google-melange.com/gsoc/homepage/google/gsoc2014

You should start on this ASAP, as the deadline is Friday at noon.

(BTW, if Eric and Raoul are reading this, great job on organization!)

chris

On Mar 19, 2014, at 11:23 AM, Ujjwal Thaakar <ujjwalthaakar at gmail.com<mailto:ujjwalthaakar at gmail.com>> wrote:

Is there a template for the application proposal?


On 19 March 2014 19:56, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
On Mar 19, 2014, at 8:28 AM, Artem Tarasov <lomereiter at gmail.com<mailto:lomereiter at gmail.com>> wrote:

On Tue, Mar 18, 2014 at 11:44 PM, Ujjwal Thaakar <ujjwalthaakar at gmail.com<mailto:ujjwalthaakar at gmail.com>> wrote:
What's the difference between SAM and VCF?

SAM: mapping software aligns reads against the reference genome (and its reverse-complement) and writes to SAM/BAM file information about best alignment of each read (to which strand it aligned, what are the differences compared to the reference, and so on)

VCF: not reads but positions on the reference genome are considered, and each record contains information about whether there's variability at a position. They are produced from SAM files by considering reads overlapping each position - if statistically significant number of reads have a base different from the reference (or an insertion/deletion), this is probably a true mutation which might have biological significance as well.

For JRuby, I'd recommend using Picard. No need to reinvent the wheel. Plus, you might also want to support the binary counterpart, BCF format.


--
Artem

Yep, if planning on going through jvm then Picard is nice and supports VCF (and BCF it seems).  No CRAM support, but there is this:

   https://github.com/enasequence/cramtools

(section on picard integration)

chris



--
Thanks
Ujjwal





More information about the GSoC mailing list