[Bioperl-l] suggestions for additions to Tree
Georgii Bazykin
gbazykin at Princeton.EDU
Wed Oct 26 17:27:07 EDT 2005
Hi,
here are some tree-related methods I needed and added to my bioperl.
Hope someone else finds any of them useful as well.
Yegor Bazykin
=============================================
To NodeI:
# modified from total_branch_length in Tree:Tree module
# gets sum of branches in the subtree - descendents of given node
=head2 children_branch_length
Title : children_branch_length
Usage : my $size = $node->children_branch_length
Function: Returns the sum of the length of all branches of the subtree which starts at given node
Returns : integer
Args : none
=cut
sub children_branch_length {
my ($self) = @_;
return 0 if($self -> is_Leaf) ;
my $sum = 0;
for ($self -> get_all_Descendents) {
$sum += $_->branch_length || 0;
}
return $sum;
}
-----------------------------------
=head2 height_nodes
Title : height_nodes
Usage : my $len = $node->height_nodes
Function: Returns the height of the tree starting at this
node. Height is the maximum branchlength to get to the tip.
Returns : The longest length to a leaf, in nodes
Args : none
=cut
sub height_nodes{
my ($self) = @_;
return 0 if( $self->is_Leaf );
my $max = 0;
foreach my $subnode ( $self->each_Descendent ) {
my $s = $subnode->height_nodes + 1;
if( $s > $max ) { $max = $s; }
}
return $max;
}
----------------------------------
=head2 get_all_Descendent_Leaves
Title : get_all_Descendent_Leaves($sortby)
Usage : my @nodes = $node->get_all_Descendent_Leaves;
Function: Recursively fetch all the nodes and their descendents, only selecting leaves
*NOTE* This is different from each_Descendent
Returns : Array or Bio::Tree::NodeI objects
Args : $sortby [optional] "height", "creation" or coderef to be used
to sort the order of children nodes.
=cut
sub get_all_Descendent_Leaves{
my ($self, $sortby) = @_;
$sortby ||= 'height';
my @nodes;
foreach my $node ( $self->each_Descendent($sortby) ) {
if ($node->is_Leaf) {
push @nodes, $node;
}
else {
push @nodes, ($node->get_all_Descendents($sortby));
}
}
return @nodes;
}
=====================================================
To Tree:
=head2 total_internal_branch_length
Title : total_internal_branch_length
Usage : my $size = $tree->total_internal_branch_length
Function: Returns the sum of the length of all branches, excluding branches leading to leaves
Returns : integer
Args : none
=cut
sub total_internal_branch_length {
my ($self) = @_;
my $sum = 0;
if( defined $self->get_root_node ) {
for ( $self->get_root_node->get_Descendents() ) {
unless ($_->is_Leaf) { # YB: THIS IS ALL I ADDED
$sum += $_->branch_length || 0;
}
}
}
return $sum;
}
=================================================
To TreeFunctionsI:
=head2 distance_nodes
Title : distance_nodes
Usage : distance_nodes(-nodes => \@nodes )
Function: returns the distance between two given nodes in numbers of nodes
Returns : numerical distance
Args : -nodes => arrayref of nodes to test
=cut
# YB: distance_nodes is very similar to distance method in TreeFunctionsI except that
# it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and
# daughter, 2 between two sisters, etc.)
sub distance_nodes {
my ($self, at args) = @_;
my ($nodes) = $self->_rearrange([qw(NODES)], at args);
if( ! defined $nodes ) {
$self->warn("Must supply -nodes parameter to distance_nodes() method");
return undef;
}
my ($node1,$node2) = $self->_check_two_nodes($nodes);
# algorithm:
# Find lca: Start with first node, find and save every node from it
# to root, saving cumulative distance. Then start with second node;
# for it and each of its ancestor nodes, check to see if it's in
# the first node's ancestor list - if so it is the lca. Return sum
# of (cumul. distance from node1 to lca) and (cumul. distance from
# node2 to lca)
# find and save every ancestor of node1 (including itself)
my %node1_ancestors; # keys are internal ids, values are objects
my %node1_cumul_dist; # keys are internal ids, values
# are cumulative distance from node1 to given node
my $place = $node1; # start at node1
my $cumul_dist = 0;
while ( $place ){
$node1_ancestors{$place->internal_id} = $place;
$node1_cumul_dist{$place->internal_id} = $cumul_dist;
$cumul_dist++; # YB
#YB if ($place->branch_length) {
#YB $cumul_dist += $place->branch_length; # include current branch
#YB # length in next iteration
#YB }
$place = $place->ancestor;
}
# now climb up node2, for each node checking whether
# it's in node1_ancestors
$place = $node2; # start at node2
$cumul_dist = 0;
while ( $place ){
foreach my $key ( keys %node1_ancestors ){ # ugh
if ( $place->internal_id == $key){ # we're at lca
return $node1_cumul_dist{$key} + $cumul_dist;
}
}
# include current branch length in next iteration
#YB $cumul_dist += $place->branch_length || 0;
$cumul_dist++; # YB
$place = $place->ancestor;
}
$self->warn("Could not find distance!"); # should never execute,
# if so, there's a problem
return undef;
}
More information about the Bioperl-l
mailing list