[Bioperl-l] planning sequence mutating modules

Tue Feb 21 14:37:32 UTC 2006

Heikki:

Let me explain what I need more clearly, and perhaps you guys can tell
me how this can be done best in Bioperl.

I’d like to marry the trees and the sequences, so that I could get a
sequence corresponding to each of the nodes (including internal nodes)
on the tree. The sequences of the nodes can be either generated by
some evolution process, or loaded; PAUP, for example, can reconstruct
the sequences of the internal nodes. I am dealing with coding
sequence, and for my purposes, I need to look at individual codons
rather than nucleotides. Then I answer questions such as this:

- for this codon (position), when (before which nodes of the tree) did
all (synonymous or non-synonymous) mutations occur?

- for this node and for this codon, when (before which node) did the
preceding (synonymous or non-synonymous) mutation occur? Preceding
means that it occurred in the line of direct ancestors, i.e. between
some two sequences on the path from this node to the root.

- infer position-specific “substitution matrix” from the tree, i.e. in
this position, what fraction of nucleotides A that were present at the
beginning of each brunch, turned into nucleotide “C” by the end of the
branch, possibly weighting with branch lengths.

Further, I need to do simulate sequence evolution along the tree,
e.g., like this:

- mutate specified codon along the tree, perhaps with given
substitution matrix (and, possibly, with given
non-synonymous/synonymous substitutions rate). In the process, the
codons for all nodes will be generated.

I need to do all this for large trees (with hundreds of leaves) and
long sequences. So far, I have been using a huge hash to store all my
sequences for each of the nodes:

my $node = (some tree::node object)
my $posit = 0; 
$codons{$posit}->{$node} =  “AAA”;

etc. But there should be a better way to do it? How can I integrate
all this into Bioperl? (I am new to object-oriented programming).

I’ll be thankful for any feedback.

Yegor

------------------------------
Tuesday, February 14, 2006, 11:09:27 AM, you wrote:

> Yegor,

> Like you said, there are examples how it is done.. It should be possible to
> evolve sequences based on a rooted tree. You just walk the tree and evolve
> each sequence from its parent.  If there is  an agreement how the branch
> lengths get translated to  mutations, even that could be done. Do you have
> any suggestions?

>         -Heikki

> On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
>> Hi,
>>
>> Just a thought: I really think that in perspective, it would be nice
>> to be able to evolve the sequence along a tree of given shape. I think
>> PAML's "evolver" has this functionality. I've already been doing this
>> in my scripts, but I am not sure how to couple the tree and the
>> sequence data properly.
>>
>> Yegor (George) Bazykin
>>
>>
>> ------------------------------
>>
>> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
>> > I've committed an interim solution to the sequence evolution problem:
>> >
>> >     $newseq = Bio::SeqUtils-> evolve
>> >         ($seq, $similarity, $transition_transversion_rate);
>> >
>> > I will go on to transform this code to fully OO, extensible solution.
>> >
>> >    -Heikki
>> >
>> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> >> Ryan Golhar's mail got me thinking that we should have a simple
>> >> framework for mutating sequences to a desired level. The model can then
>> >> be extended to necessary complexity when needed by subclassing.
>> >>
>> >> To start with, I have been planning:
>> >>
>> >>
>> >> Bio::SeqEvolution::EvolutionI - interface file
>> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>> >>         (defaults to Bio::PrimarySeq)
>> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>> >>        - returns an array of $count seqs
>> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>> >>       converteed to probabilites of change internally
>> >>
>> >>   various methods to define the extent of divergence:
>> >>   only one to start with:
>> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>> >>    (= 100% - identity)
>> >>
>> >> Bio::SeqEvolution::Factory - core class to call,
>> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>> >>
>> >>
>> >> Bio::SeqEvolution::DNASimple - default for nucleotides
>> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>> >>         e.g. 5 => 5:1, defaults to 1:1
>> >>         simple alternative to a scoring matrix
>> >>
>> >>
>> >> I am soliciting usual comments and suggestions about naming and minimal
>> >> functionality.
>> >>
>> >>
>> >>    -Heikki
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l