[Biopython-dev] Fwd: Python_MKT

Zheng Ruan zruan1991 at gmail.com
Fri Sep 6 04:00:06 UTC 2013


Hi Juraj,

I am also planing to implement MK test into my GSoC framework. I just went
through you code and it is really independent. Will you be also to modify
it to utilize the MultipleSeqAlignment, Alphabet and CodonTable module of
Biopython so that it is more extendable?

As to the multi_short_path() function, you really confused me. Is your
implementation guaranteed to find the shortest path? This problem can be
abstracted as finding the minimum spanning tree in graph theory and a good
algorithm is known (Prim algorithm or Kruskal algorithm). My idea of
linking multiple codons is first generate a codon by codon matrix
representing the synonymous and nonsynonymous substitutions each codon
needs to change to the other in advance. Then finding the minimum spanning
tree that connect all the node in the matrix with minimum length (least
synonymous substitutions). I plan to implement this and you may have more
insight about my suggestions. Thanks!

Best,
Zheng Ruan


On Thu, Sep 5, 2013 at 10:33 AM, Juraj Bergman <jurajbergman at hotmail.com>wrote:

>
>
>
> Dear all,
> I'm resending my implementation of the McDonald-Kreitman test.
> Link to the description of the module:
> https://www.dropbox.com/s/zgnz8xwlcsispzf/Python_MKT.pdf
> Link to the code:https://www.dropbox.com/s/1z3opj4rbb0ms14/Python_MKT.py
> I apologise for the initial mistake of sending attachments instead of
> links.
> Kind regards,
> Juraj Bergman
> P.S. Regarding the multi_short_path() function - I realize that it is
> very, very repetitive butI have not (yet) managed to find a suitable loop
> construction that would replace the current code. The multi_short_path()
> function is by far the most complex function of the modulebecause its
> purpose is to find the codon network with the least amount of overall
> nucleotide substitutions and the least amount of non-synonymous nucleotide
> substitutions (given any combination of codons). Each codon is being
> represented as multiple lists of two integers (depending on the overall
> amount of codons being processed). The first integer specifies the amount
> of synonymous and the second specifies the amount of non-synonymous
> substitutions.For example, if 10 codons are being fitted in a network, then
> there are 10x10 = 100 combinations of codon-codon pathways, each
> represented with a two-integer list, and out of these 100 lists, the 'best'
> 10 have to be chosen to get the most optimal codon networ!
>  k (and the repetitiveness of thefunction mainly arises because of this
> process). This is, in short, a description of the function and I would
> appreciate any pointers that would help to make the code more succinct :)
>
>
>
>
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



More information about the Biopython-dev mailing list