[Biopython-dev] Project ideas for GSoC (or other student projects)

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 21 16:55:30 UTC 2013


On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> I like Michiel's idea, and I'll suggest two more:
>
> 1. Codon alignment & analysis:
> - PAL2NAL-style conversion of unaligned nucleic acid sequences and a protein
> sequence alignment to a codon alignment. (Previously discussed)

e.g. https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py

> - dN/dS and the related functions needed to calculate it.
> - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
> codon alignments, including validation (testing for frame shifts etc.)

http://biopython.org/wiki/Google_Summer_of_Code#Codon_alignment_and_analysis

I see you've started fleshing this idea out on the wiki, which is great.
Right now it seems a little on the light weight side - or is that deliberate
(to see if a student can take this idea and come up with a solid
project proposal in this area)? Things like model selection might
be a fun extension - I can think of a local expert who would be
great to get involved on the science side if he's interested.

Alternatively this could include doing some more general work
on the alignment object - for instance per-column-annotation
for things like a consensus sequence - or an array-of-char
implementation as an alternative to the list-of-SeqRecords
we have now (with its poor column access speed).

Peter



More information about the Biopython-dev mailing list