[GSoC] BioPerl for NGS

Fields, Christopher J cjfields at illinois.edu
Wed Mar 12 14:40:57 UTC 2014


Devang,

The first step is to identify the specific area to focus on initially.  Code for this should also go into a separate repository on github.

My thoughts on this: 

At the moment, all representations of NGS sequence in Bioperl are based on looking at the sequence level.  My (possibly rash) thought is to create a new class that represents a set of NGS data as a collection (e.g. a FASTQ file or set of files, for instance), which by design would alleviate overhead in downstream analyses; you could create any requested sequences lazily and only as requested, and downstream tools (quality trimming, aligners) would simply use the raw data and create new collections classes to handle the output.  The project would entail setting up the foundations for that class (possibly using modern Perl modules), implementing it in a simple way using an example FASTQ dataset, and then possibly showing how downstream tool wrappers (e.g. trimmers, aligners) could use this for analysis, e.g. how they interact with the sequence collection instance.

However, there are lots of areas that can be focused on; the project is up to you.  If you have a specific area of interest (NGS analysis is fairly broad) then you could focus on any of them: alignment, assembly, analysis, etc.  

chris

On Mar 12, 2014, at 2:08 AM, Devang Varia <variadevang at gmail.com> wrote:

> Hello All,
>          What is the next step in taking forward the Idea.
> Regards,
> Devang Varia
> _______________________________________________
> GSoC mailing list
> GSoC at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/gsoc





More information about the GSoC mailing list