[Bioperl-l] Unwise elimination of nodes in B:T:Node::remove_Descendent?
Mark A. Jensen
maj at fortinbras.us
Fri Feb 6 05:49:12 UTC 2009
Hi All,
This is a pedantic one, but it addresses a relatively
deep (or capricious?) choice made in the Node.pm code.
Please bear with me.
Background- I was trolling the archive for good scraps
(dumpster-diving, if you will), and came upon this
candidate:
"branch length score - total length of the spanning subtree"
http://lists.open-bio.org/pipermail/bioperl-l/2008-March/027279.html
Briefly, the question was, how do you get the total branch
length of a subtree, and the solutions discussed involved
using splice() to edit the tree, either by explicitly choosing
the subtree nodes (-keep_id) or removing the other nodes
(-remove_id), then performing total_branch_length() on the
edited tree.
Looking over the code, I saw two bugs. One was in the
ultimate "solution" in the thread. Deeper than that is
what I think is a problem in B:T:Node::remove_Descendent.
The question-asker, Daniel G., wrote the following:
(using the example tree I have painstakingly rendered
below, only to be destroyed by my proportional font):
A B C D E
|5 |5 |4 |4 |
\ / \ / /
x y /
|2 |1 /
\ / /
\ _/ / 10
\ / /
z /
|3 /
\ /
ROOT
> Daniel Gerlach wrote:
> Thanks for the quick answer. I tried:
>
> use Bio::TreeIO;
> my $treeio = Bio::TreeIO->new(-format => 'newick',
> -fh => \*DATA);
> my $tree = $treeio->next_tree;
> print $tree->total_branch_length,"\n";
> $tree->splice(-keep_id => [A,B,E]);
> print $tree->total_branch_length,"\n";
>
> __DATA__
> (((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10);
>
> Which gives me the message "MSG: After splicing, the original root was
> removed but there are multiple candidates for the new root!" however the
> root E was not removed.
>
> If I do it the complementary way by splicing out all unwanted nodes -
> splice(-remove_id => [C,D]) - I get what I want:
>
> 34
> 25
The first problem was that the "complementary approaches" didn't give
the same answer, and one threw an error. This is the bug in the
code above. If you look at the tree, the nodes he really wants to keep are
the leaves [A,B,E], *plus* the internal nodes [x,y,z]; that is...
$tree->splice(-keep_id=>[A,B,E,x,'y',z])
$tree->total_branch_length
doesn't throw and returns 25, which is correct.
The second problem is that removing [C, D] also gives him 25, which is
what he wanted, but is not correct. If the node 'y' remains in the
tree after removing C and D, then the branch length should be
26 (z-y branch).
The problem arises in this code at the end of remove_Descendent:
# remove unecessary nodes if we have removed the part |||Node.pm LINE 272
# which branches.
my $a1 = $self->ancestor;
if( $a1 ) {
my $bl = $self->branch_length || 0;
my @d = $self->each_Descendent;
if (scalar @d == 1) {
$d[0]->branch_length($bl + ($d[0]->branch_length || 0));
$a1->add_Descendent($d[0]);
$a1->remove_Descendent($self);
}
}
When node C is removed from the example tree, the node 'y' is removed
by this code apparently as a convenience. But this node is obviously
not 'unnecessary', any more than the root of a bifurcating tree is
unnecessary, which also is of order 2 and not 1 or 3 like good
self-respecting nodes are.
When I comment out the above remove_Descendent fragment, the following
code
use Bio::TreeIO;
my $treeio = Bio::TreeIO->new(-format => 'newick',
-fh => \*DATA);
my $tree = $treeio->next_tree;
my $keep_tree = $tree->_clone;
my $rmv_tree = $tree->_clone;
print $tree->total_branch_length,"\n";
$keep_tree->splice(-keep_id => [A,B,x, z,r,E]);
print $keep_tree->total_branch_length,"\n";
$rmv_tree->splice(-remove_id => [C,D]);
print $rmv_tree->total_branch_length,"\n";
$rmv_tree->splice(-remove_id => ['y']);
print $rmv_tree->total_branch_length,"\n";
__DATA__
(((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10)r;
returns
34 # original tree [A,B,C,D,E,x,'y',z,r]
25 # keep desired nodes [A,B,E,x,z,r]
26 # remove [C,D]
25 # remove undesired nodes [C,D,'y']
which is correct.
Question is then, is the removal of "unnecessary nodes" relied upon?
I will run the tests, but my thinking is that even if we get some
fails, those should be dealt with by removing order 2 nodes by
special request (looks like contract_linear_paths() is the thing to
use for this).
cheers and thanks,
MAJ
More information about the Bioperl-l
mailing list