[Bioperl-l] Getting started with Bio::TreeIO
Jason Stajich
jason@cgt.mc.duke.edu
Sun, 16 Jun 2002 22:30:37 -0400 (EDT)
Serendipity? I have just started (this week) to refocus on my initial
efforts on the Tree modules to build more functionality as part of a push
to build evolutionary and comparative genomics tools in Bioperl.
More to details as to the exact plan when I get a little more in the thick
of it. The first part that was committed last week - a wrapper around one
of the PAML programs (codeml) to calculate Ka,Ks,and Ka/Ks values.
On Mon, 17 Jun 2002, Howard Ross wrote:
> Dear folks,
> I would like to be able to read in a phylogenetic tree, manipulate its
> branching structure by repositioning clades, and then write it to file.
> The Bio::Tree modules seem the obvious tool. I'm have trouble getting
> started and would appreciate a bit of help.
>
> In working with the Bio::Tree modules recently, I have deduced:
> - they do not support Newick trees WITHOUT branch lengths,
> although this is a legitimate format, and
Good point - bug fix checked into main trunk tonight. Can now read/write
newick w/ and w/o branch labels. I also added a new made-up format that I
call Bio::TreeIO::tabtree which prints trees in an ASCII drawing of sorts.
Very simple until I hook up the drawgram connection to the PHYLIP wrapper
(which is on the TODO list).
Contact me off the list if you are uncomfortable with CVS checkouts of the
code - however this is going to have to be bleeding edge code for at least
the summer.
> - ancestor is defined only for rooted trees (this makes sense)
> Can I suggest that these get put into the POD sections, to save the
> next person's time.
>
Sure thing. We should talk more about what should be expected behaviors
for different types of trees and what expanded list of questions we want
to address with trees.
> I have successfully read in a tree, and then written it out to file.
> But, I have been totally unsuccessful at making any changes to the
> tree and then saving it to file.
>
I think that I ran into a bit of trouble implementing this and probably
never fixed it - spent a little time on it tonight, see below for
comments. Thank you for a REALLY GOOD BUG REPORT since it is easy for me
to get in and start playing with the bugs.
> The following piece of code should swap nodes C and G in the tree
> (shown below) but it only recreates the input tree.
> #===============================================================
> #!/usr/bin/perl
>
> use Bio::TreeIO;
>
> my $treeio = new Bio::TreeIO(-format => 'newick', -file =>
> 'rooted8tree.tre');
> my $treeout = new Bio::TreeIO(-format => 'newick', -file => ">saveme.txt" );
> my $tree = $treeio->next_tree;
> my @nodes = $tree->get_nodes;
>
> my $i, $c, $g;
>
> for ($i = 0; $i <= $#nodes; $i++) {
> if ($nodes[$i]->id eq 'C') {
> $c = $i;
> }
> if ($nodes[$i]->id eq 'G') {
> $g = $i;
> }
> }
> my $cancestor = $nodes[$c]->ancestor;
> my $gancestor = $nodes[$g]->ancestor;
> $nodes[$c]->ancestor($gancestor);
> $nodes[$g]->ancestor($cancestor);
>
> $treeout->write_tree($tree);
The ancestor updates don't fix the fact that we have a reverse pointer for
the parent->child relationship which is what is read when the tree is
dumped out for the Tree writer - you have to update the child list of the
ancestor node to reflect the changes you want - the ancestor method is
basically a convience method which is kept up to date by the
add_Descendent method, I should probably make it read-only to prevent
these problems in the future (okay -its done now).
Perhaps I may have implemented the tree and node objects as more
complicated than they need to be? I think it is justified given the
different capabilities we want. If someone wants to suggest a better
implementation please code it up and propose it.
Here are the capabilities I think that Trees and their Nodes need:
* Represent Polytomys (>2 children per node).
So we use a hash at each Node to represent the pointers to all of
its children. The has is updated through the methods
add_Descendent(). To retrieve all the immediete children
call each_Descendent(). To get all the children and all of their
children (a recursive call), use get_Descedents().
* Be able to get the ancestor (parent) of a child node.
The add_Descendent method also does some housekeeping for us - it
updates the ancestor pointer. This is accessed through the
ancestor() method.
* Store branch length, description text, and a user readable id.
Additionally have an associated unique id for a node.
* Remove a child from one node and add it as a child to another node.
(As brought up by Howard). I added the methods
remove_all_Descendents() which wipes out a node's list of children
(but does not necessarily DESTROY them if you have a ptr handle to
them elsewhere). And the method remove_Descendent($node) which
will remove the requested child from a node. (This is based on the
the internal_id so implementing a db store for trees will need to
be sure and properly store an id for Nodes).
So to do what you want to do - which is move a node to a different clade -
we need to do a couple of things - remove it from the list of children for
the parent node of that clade, and add it as a child to the head node of
the second clade you want it in. I have added your code in the t/TreeIO.t
test by reading in your example file and providing the tests.
hmm - the anonymous server is down temporarily or I'd post the link to the
code, we'll get that back up soon. Anyways, it basically looks like this:
(Set the $vebose flag to 1 if you want to see the tree printed out).
$treeio = new Bio::TreeIO(-format => 'newick',
-fh => \*DATA);
my $treeout = new Bio::TreeIO(-format => 'tabtree');
my $treeout2 = new Bio::TreeIO(-format => 'newick');
$tree = $treeio->next_tree;
if( $verbose > 0 ) {
$treeout->write_tree($tree);
$treeout2->write_tree($tree);
}
@nodes = $tree->get_nodes;
my( $i, $c, $g);
for ($i = 0; $i <= $#nodes; $i++) {
next unless defined $nodes[$i]->id;
if ($nodes[$i]->id eq 'C') {
$c = $i;
}
if ($nodes[$i]->id eq 'G') {
$g = $i;
}
}
$nodes[$c]->ancestor;
$nodes[$g]->ancestor;
my $cancestor = $nodes[$c]->ancestor;
my $gancestor = $nodes[$g]->ancestor;
$cancestor->id('C-ancestor'); # let's provide a way to test if we suceeded
$gancestor->id('G-ancestor'); # in our swapping
$cancestor->remove_Descendent($nodes[$c]);
$gancestor->remove_Descendent($nodes[$g]);
$cancestor->add_Descendent($nodes[$g],1);
$gancestor->add_Descendent($nodes[$c],1);
@nodes = $tree->get_nodes();
for ($i = 0; $i <= $#nodes; $i++) {
next unless defined $nodes[$i]->id;
if ($nodes[$i]->id eq 'C') {
# test that we swapped nodes properly
ok($nodes[$i]->ancestor->id, 'G-ancestor');
$c = $i;
}
if ($nodes[$i]->id eq 'G') {
$g = $i;
# test that we swapped nodes properly
ok($nodes[$i]->ancestor->id, 'C-ancestor');
}
}
if( $verbose > 0 ) {
$treeout2->write_tree($tree);
}
__DATA__
(((A:1,B:1):1,(C:1,D:1):1):1,((E:1,F:1):1,(G:1,H:1):1):1);
> #==============================================================
> -rooted8tree.tre-
> (((A:1,B:1):1,(C:1,D:1):1):1,((E:1,F:1):1,(G:1,H:1):1):1);
>
> I have also tried to rearrange the tree by adding nodes and defining
> the descendents, again without success.
>
> So, the big question is how do you rearrange nodes in these trees:
>
> - To specify a node-node relationship, do you need to specify both the
> ancestor and the descendent fields of the respective nodes?
Just need to update the descendents as described above as the ancestor ptr
is updated by the add_Descendent method
>
> - Can (should) you have more than one tree in memory?
>
sure, there is no problem with that - if you are basing one tree off the
nodes of another you will need to make explicit copies of each node - we
probably need to think about a 'clone' method or something so that you
don't get into trouble making updates in one list of nodes which affects
the other list as well.
> - How do you specify which tree a particular node belongs to?
>
I think a tree is just a grouping of nodes - if you wanted to test if two
nodes are in the same tree - you would want to walkup the ancestor
pointers until you got to the root node and compare the internal_ids for
each of those nodes.
> - Do all nodes in a particular tree have to be defined in the same array?
>
no - you can add nodes after a tree is built - the nodes array is a
convient way to manipulate a tree when you know it is binary (see the
RandomTree generator in Bio::Tree).
> - Given the situation with ancestor and unrooted trees, how do you
> (re)define node-node relationships for this type of tree?
hmm - so the definition of an unrooted tree is on which would have > 1
nodes with no ancestor. It is quite fine to do the rearranging as I
describe above - the problems come when you want to test if two nodes are
in the same tree. I'm quite sure this is covered in the CS literature -
I'll leaf through my Gusfield text and see what he covers wrt to
phylogenetic trees.
Hope this is the beginning of a good answer - I'll try and convert this
commentary into better documentation as time permits (or if someone else
wants to contribute this I'll gladly step aside!).
-j
>
> Your help would be much appreciated.
>
> Howard
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu