[Bioperl-l] Unwise elimination of nodes in B:T:Node::remove_Descendent?

Chris Fields cjfields at illinois.edu
Fri Feb 6 14:46:16 UTC 2009


On Feb 6, 2009, at 8:21 AM, Hilmar Lapp wrote:

>
> On Feb 6, 2009, at 12:49 AM, Mark A. Jensen wrote:
>
>> A   B   C   D   E
>> |5  |5  |4  |4  |
>> \   /   \   /   /
>> x       y    /
>> |2      |1  /
>> \       /  /
>>  \    _/  / 10
>>   \  /   /
>>    z    /
>>    |3  /
>>     \ /
>>     ROOT
>>
>>> [...]
>>> __DATA__
>>> (((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10);
>>> [...]
>>
>> The first problem was that the "complementary approaches" didn't give
>> the same answer, and one threw an error. This is the bug in the
>> code above. If you look at the tree, the nodes he really wants to  
>> keep are
>> the leaves [A,B,E], *plus* the internal nodes [x,y,z]; that is...
>>
>> $tree->splice(-keep_id=>[A,B,E,x,'y',z])
>> $tree->total_branch_length
>>
>> doesn't throw and returns 25, which is correct.
>
> If you replace 'y' with the root in the above, yes. Otherwise no.
>
>> The second problem is that removing [C, D] also gives him 25, which  
>> is
>> what he wanted, but is not correct.
>
> Correct.
>
>> [...]The problem arises in this code at the end of remove_Descendent:
>>
>> # remove unecessary nodes if we have removed the part |||Node.pm  
>> LINE 272
>> # which branches.
>>   my $a1 = $self->ancestor;
>>   if( $a1 ) {
>>       my $bl = $self->branch_length || 0;
>>       my @d = $self->each_Descendent;
>>       if (scalar @d == 1) {
>>    $d[0]->branch_length($bl + ($d[0]->branch_length || 0));
>>    $a1->add_Descendent($d[0]);
>>    $a1->remove_Descendent($self);
>>       }
>>   }
>>
>> When node C is removed from the example tree, the node 'y' is removed
>> by this code apparently as a convenience.
>
> I guess I'm confused by the above code snippet. If @d are the  
> descendants of $a1, then I don't understand what the purpose of  
> adding $d[0] as a descendant of $a1 is (after altering its branch  
> length). Furthermore, if $a1 is the ancestor of $self, and $a1 has  
> only one descendant, wouldn't that mean that $d[0] == $self?
>
> What am I missing?
>
> I also think it's a bad idea to simplify the tree (or collapse  
> internal nodes of degree 1) as an implicit operation. It should be  
> explicit (and I believe there is a method simplify() or something  
> similar, isn't there? Ah - I see you quote contract_linear_paths()).
>
> Furthermore, I think it's also a bad idea to remove descendant leaf  
> nodes if you remove an internal node. What if you really just wanted  
> to remove the internal node because, for example, its branching  
> point isn't well supported? So removing node 'y' should make z a  
> node of degree 3, but not remove C and D unless you ask to remove  
> the subtree beginning at 'y'.
>
> 	-hilmar

I suppose the best way to deal with some of these questions (and  
ensure Node/Tree is acting as expected) is to come up with several  
vetted test cases indicating what we expect the proper behavior to be  
for remove_Descendant(), contract_linear_paths(), and any other  
problematic Node/Tree/TreeFunctionI methods.  In fact, I highly  
recommend any code changes like this add tests to the test suite  
demonstrating the issue.

Possibly related to all this is a fairly significant lingering bug  
dealing with Bio::Tree::TreeFunctionsI::reroot() (http://bugzilla.open-bio.org/show_bug.cgi?id=2456 
).  Any takers?

chris



More information about the Bioperl-l mailing list