[Biopython] Google Summer of Code 2014: Student application

Lluís Revilla lluis.revilla at gmail.com
Mon Mar 17 19:09:31 UTC 2014


Hi everyone,

I am a Biotechnology student and I want to contribute to Biopython. I have
read the wiki GSoC page and I found two ideas. But I think I don't have the
desired skills, I am not much familiarized with the Biopython's existing
sequence parsing yet ("Indexing & Lazy-loading Sequence Parsers"), or with
javascript ("Interactive GenomeDiagram Module"). So I am  thinking to make
a proposal for the Google Summer of Code about a comparing tool.

My idea comes from the following: I have been several time in charge of
selecting a tool to do a certain process e.g.: A list of predicted genes, a
list of possible structures, a list of alignments...

But usually in bioinformatics there are many programs to do the same thing,
usually they use a different algorithm a different training set data
(prokaryote, eukaryote ), or have different specifications. And they return
a more or less sophisticated list, in some standard format, FASTA, GFF,
Genebank...

The problem when starting a project is to select from this different
programs which one use for the task, e.g.: Which gene predictor is better
for prokaryote: Glimmer, EasyGene, GeneMarker, Prodigal, AUGUSTUS...? The
answer will be specific to the project but sometimes its difficult to
ensure that it is a good selection. (Other times it is good enough to do
what the majority do.) But does not solve the problem when new algorithms
appears, or even to compare between different program versions.

To cover this problem I would like to develop for Biopython a module to
compare between the different programs output to asses which one is better
for the task.
Currently I developed a parser for the afford mentioned programs and it
compares them in a (very) rude way. I would like to develop further and
release it to the Biopython community.

What are your thoughts about this idea?
Thanks,

Lluís




More information about the Biopython mailing list