[Bioperl-l] Find distances between LCA's and the other nodes from a tree

Thomas Sharpton thomas.sharpton at gmail.com
Wed Oct 5 17:47:53 UTC 2011


Hi Ross,

Let's keep this discussion on the listserv so that others can benefit  
from and contribute to it.

I just executed your script using the tree you provided (for those  
following along, I have copied the script and tree below my  
signature). It produces no warnings or errors on my machine. Look  
below my signature to see the terminal output of this test run on my  
computer.

Are you working with the latest version of bioperl? What version of  
perl are you using?

I see that the warning (which is distinct from an error) that you are  
getting is:

"MSG: At least some nodes do not have a branch length, the distance  
returned could be wrong"

I also see in your newick tree that you have given the root node (node  
G) an ID, but no branch length. I'm thinking that you may have an old  
version of bioperl, which might bark at you when you have a root node  
with an id but no branch length. Note that the distances being printed  
in my output (which is not the distance between the node ids that  
precede the value printed, but instead the distance between the LCA of  
these node ids and the second of the two ids printed) are indeed  
correct.

Can you provide information on the version of BioPerl and Perl that  
you are trying this on?

Best,
Tom

##########
testtree.nwk
------------------
((A:0.1,B:0.2)F:0.3,(C:0.3,D:0.4)E:0.5)G;

##########
test.pl
-----------------
use Bio::TreeIO;
use Bio::Tree::TreeFunctionsI;

$tab = "\t"; $nl = "\n";
my ($testtree) = @ARGV;
my $treein = Bio::TreeIO->new( -file => $testtree, -format =>  
'newick' );
while( my $tree = $treein->next_tree ){
	%dist_matrix = ();

	my @leaves = $tree->get_leaf_nodes;
	foreach my $leaf1( @leaves ){
		my $id1 = $leaf1->id;	
		foreach my $leaf2( @leaves ){
			my $id2 = $leaf2->id;
			if ($id1 eq $id2) {
				$dist_matrix{$id1}->{$id2} = 0;
				next;
			}
#			next if( defined( $dist_matrix{$id1}->{$id2} ) || defined  
( $dist_matrix{$id2}->{$id1} ) );
			my $distance = $tree->distance( -nodes => [$leaf1, $leaf2] );
	#print "$leaf1 and $leaf2 distance: $distance\n";
#	print "$id1 and $id2 distance: $distance\n";

#=pod
			print $id1; <STDIN>;
			print $id2; <STDIN>;
				my $lca = $tree->get_lca(-nodes => [$leaf1, $leaf2] );
#			print $lca->id; #<STDIN>;
			my $distance2 = $tree->distance( -nodes => [$lca, $leaf2] );
			print $distance2; <STDIN>;

			$dist_matrix{$id1}->{$id2} = $distance;
#=cut
		}
	}
}

#########
output
---------------
[sharpton at kazushi ~]$perl test.pl test.nwk
A
B
0.2
A
C
0.8
A
D
0.9
B
A
0.1
B
C
0.8
B
D
0.9
C
A
0.4
C
B
0.5
C
D
0.4
D
A
0.4
D
B
0.5
D
C
0.3
[sharpton at kazushi ~]$



On Oct 5, 2011, at 7:59 AM, Ross KK Leung wrote:

> Hi Tom,
>
> I personally attach this program and tree files to you and see  
> whether you can run the program without any error (mine does, I  
> can't use Modern perl as I can't request the admin to install/update  
> anything)....
>
> Thanks a lot, Ross
>
> <image001.png>
> <image002.png>
> <image003.png>
>
>
>
> From: Thomas Sharpton [mailto:thomas.sharpton at gmail.com]
> Sent: 2011年10月5日 9:44
> To: Ross KK Leung
> Cc: Rutger Vos; bioperl-l
> Subject: Re: [Bioperl-l] Find distances between LCA's and the other  
> nodes from a tree
>
> Hi Ross,
>
> I think Rutger has given you a good idea and it is certainly simpler  
> than the solution I just concocted (no surprise there).
>
> Regarding your most recent note:
>
> 2011/10/4 Ross KK Leung <ross at cuhk.edu.hk>
> Thanks, Rutger. While there is a convenient function to help  
> calculate the
> distances between leaf nodes, I find it difficult for ancestors.  
> After using
> get_lca, it seems that bioperl has no data structures to help  
> manipulate
> these LCA's.
>
>
> From what I understand, the get_lca does return a node object,  
> specifically an internal node. The following is from the BioPerl  
> documentation (note the "Returns" line):
>
>
>
>  Title   : get_lca
>  Usage   : get_lca(-nodes => \@nodes ); OR
>            get_lca(@nodes);
>  Function: given two or more nodes, returns the lowest common  
> ancestor (aka most
>            recent common ancestor)
>
>
>  Returns : node object or undef if there is no common ancestor
>  Args    : -nodes => arrayref of nodes to test, OR
>            just a list of nodes
>
> You can get the distance between any two nodes, including an  
> internal node, using the distance method. Note that the nodes you  
> input into the distance function do not need to have ids associated  
> with them. For example,
>
> #$nodeA and $nodeB are BioPerl node objects that correspond to any  
> node on the tree
> my $lca = get_lca( $nodeA, $nodeB );
> my $distance = distance( $nodeA, $lca );
>
> $distance will now contain the total branch length between nodeA and  
> the last common ancestor between nodes A and B. At no point are node  
> ids used to conduct calculation; only the node objects themselves  
> are needed. Anything that you can do to a node, you can do to $lca.
>
> This is how I understand the functions, at least. Someone please  
> correct me if I'm wrong, as I fear I sound like a broken record at  
> this point....
>
> Best,
> Tom
> <test.pl><test.nwk>





More information about the Bioperl-l mailing list