[Biopython-dev] Python_MKT

Fri Sep 6 15:07:01 UTC 2013

Hi Juraj,

It's good to hear that you plan to do that. A big advantage of using
Biopython module is to make your MKT test more integrated with existing
functions. This can be helpful to design pipeline within Biopython. What I
would also try is to use the Bio.Data.CodonTable so that user can specify
genetic code of their gene of interest.

I think there are situations where you are not able to minimize synonymous
and non-synonymous, and non-synonymous substitutions at the same time. If I
understand your point correctly, multi_short_path() function tries to find
the least synonymous and non-synonymous substitutions from a set of paths
that all holds minimum non-synonymous substitutions, right? In this case,
for example when you have 10 different codons at hand, you can first start
from each codon and build a minimum spanning tree. And then you expect at
most 10 minimum spanning trees, all with equal number of minimum
non-synonymous substitutions. Finally, you can pick the tree with least
overall substitutions (non-synonymous and synonymous) from the set of
trees. I don't expect the algorithm to cost more than 2000 lines. Maybe we
can discuss this more after I finish coding this weekend. Thanks!

Best,
Zheng Ruan

On Fri, Sep 6, 2013 at 2:38 AM, Juraj Bergman <jurajbergman at hotmail.com>wrote:

>  Hi Zheng,
>
> I think that the utilization of MultipleSeqAlignment and other modules
> already implemented in the Biopython framework is the next step in
> developing my module. The code was made independent because it says on the
> Biopython wiki that, when
> submitting code, it should be generalized so I didn't use any existing
> Biopython modules...
>
> As for the multi_short_path() function - it is guaranteed to find the
> shortest path (as far as I've tested it and I've tested it quite a bit) but
> I agree that it is very  confusing (even for me), but it works...  But
> still, my next goal is to try and rewrite it (so thank you for the
> suggestions :). The codon-codon matrix principle you described is also the
> principle behind the multi_short_path() function and, I think, it is a good
> way of tackling the problem... But in the end the result of the multi
> _short_path() is to find a tree with the least amount of overall
> substitutions (synonymous + non-synonymous) and with the number of
> non-synonymous substitutions being minimized. If you try to connect the
> nodes based solely on the minimum amount of synonymous substitutions you
> may not always get a minimum length tree (for example: if considering only
> the synonymous substitutions, then, theoretically, a codon_a -> codon_b
> exchange which requires two synonymous changes has priority over a codon_a
> -> codon_c which requires only one non-synonymous change, and that in turn
> can affect the length of the whole tree) - I hope this makes some sense
> to you... Also, when connecting nodes, I took the approach of first making
> a root of the tree and then building the tree from that root, otherwise you
> could end up with multiple unconnected branches... I hope this helps with
> your implementation... If I come up with a better alternative to the
> multi_short_path() I'll be sure to post a link!
>
> Again, thanks for taking the time to going through my code, all the best,
>
> Juraj
>
> ------------------------------
> Date: Fri, 6 Sep 2013 00:00:06 -0400
> Subject: Fwd: [Biopython-dev] Python_MKT
> From: zruan1991 at gmail.com
> To: biopython-dev at biopython.org; jurajbergman at hotmail.com
>
>
> Hi Juraj,
>
> I am also planing to implement MK test into my GSoC framework. I just went
> through you code and it is really independent. Will you be also to modify
> it to utilize the MultipleSeqAlignment, Alphabet and CodonTable module of
> Biopython so that it is more extendable?
>
> As to the multi_short_path() function, you really confused me. Is your
> implementation guaranteed to find the shortest path? This problem can be
> abstracted as finding the minimum spanning tree in graph theory and a good
> algorithm is known (Prim algorithm or Kruskal algorithm). My idea of
> linking multiple codons is first generate a codon by codon matrix
> representing the synonymous and nonsynonymous substitutions each codon
> needs to change to the other in advance. Then finding the minimum spanning
> tree that connect all the node in the matrix with minimum length (least
> synonymous substitutions). I plan to implement this and you may have more
> insight about my suggestions. Thanks!
>
> Best,
> Zheng Ruan
>
>
> On Thu, Sep 5, 2013 at 10:33 AM, Juraj Bergman <jurajbergman at hotmail.com>wrote:
>
>
>
>
> Dear all,
> I'm resending my implementation of the McDonald-Kreitman test.
> Link to the description of the module:
> https://www.dropbox.com/s/zgnz8xwlcsispzf/Python_MKT.pdf
> Link to the code:https://www.dropbox.com/s/1z3opj4rbb0ms14/Python_MKT.py
> I apologise for the initial mistake of sending attachments instead of
> links.
> Kind regards,
> Juraj Bergman
> P.S. Regarding the multi_short_path() function - I realize that it is
> very, very repetitive butI have not (yet) managed to find a suitable loop
> construction that would replace the current code. The multi_short_path()
> function is by far the most complex function of the modulebecause its
> purpose is to find the codon network with the least amount of overall
> nucleotide substitutions and the least amount of non-synonymous nucleotide
> substitutions (given any combination of codons). Each codon is being
> represented as multiple lists of two integers (depending on the overall
> amount of codons being processed). The first integer specifies the amount
> of synonymous and the second specifies the amount of non-synonymous
> substitutions.For example, if 10 codons are being fitted in a network, then
> there are 10x10 = 100 combinations of codon-codon pathways, each
> represented with a two-integer list, and out of these 100 lists, the 'best'
> 10 have to be chosen to get the most optimal codon networ!
>  k (and the repetitiveness of thefunction mainly arises because of this
> process). This is, in short, a description of the function and I would
> appreciate any pointers that would help to make the code more succinct :)
>
>
>
>
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>
>