[Biojava-l] Rooted trees in nexus files

Tue Nov 3 17:58:14 UTC 2009

Hi,

> On 2 Nov 2009, at 23:11, Tiago Antão wrote:
>
>> 2009/11/2 Richard Holland <holland at eaglegenomics.com>:
>>> In the meantime, the JGraph library which is used for displaying  
>>> JGraphT
>>> graphs in a visual form does include root-finding methods, so  
>>> maybe you
>>> could investigate there to see if any of the existing functions  
>>> might help?
>>
>> Did that. None can help as the graph is not directed (it would be
>> trivial with a directed graph ,of course).
>> In the current form, the nexus parser is of limited use for tree  
>> information:
>> 1. For rooted trees it has a bug has it doesn't say what is the root
>
> The Newick strings used in the Nexus format are themselves  
> undirected graphs. They don't specify which node is the root, which  
> means it must be determined by computation after parsing the string.  
> I'm unsure of the algorithm to use to do this. If there are people  
> on this list who know the algorithm and have time to code it up,  
> volunteers would be welcome.

There is a way to uniquely  get a root from a newick string. Usually a  
rooted newick is surrounded with brackets, which indicates the root as  
the highest node in the tree. For example:

(A, (B,C))

describes a tree rooted between "A" and the clade (B,C), and with the  
surrounding brackets this is unique.

In nexus the situation might be a bit different. nexus allows you to  
prefix the newick string with [&R] or [&U] to indicate rooted/unrooted  
trees. For example:

tree treename = [&R] ((A,(B,C)),(D,E));

is a valid rooted nexus tree where the root is placed between the  
clades [A.B,C] and [D,E], although in this example the newick is  
surrounded with brackets and rooted uniquely by itself.

>> 2. For unrooted trees, sometimes the "root" (what the user perceives
>> as root) is interesting information.
>
> What the user perceives as root in an unrooted tree could be  
> different for every user, so it would be hard to provide a standard  
> function to read their mind! However if everyone can come up with a  
> commonly agreed way of determining the most likely root  
> computationally, it would be interesting to add this as a feature,  
> with the caveat that it is only a best-effort approximation as the  
> original tree is unrooted.

BioNJ implements multiple methods to determine a root in a neighbor- 
joining tree. I can look it up, but I think the most common ways to  
compute the root are: try to place the root in the "middle" such that  
your tree is balanced and you have equal number of leaves to both  
sides of the tree. The other method I remember is based on the edge  
weights. Basically you find the longest path between two leaves and  
place the root in the middle of that path (based on the path length).
I think the most common way though is to specify an outgroup node and  
place the root on the path between that outgroup and its successor. I  
am not sure if the outgroup can be described in nexus somehow.

I would also suggest to generally parse trees as rooted trees (maybe  
jsut for th initial internal model). Creating an unrooted tree from a  
rooted  one is easy, remove the root and forget about directions. The  
other way might be hard and ambiguous.

cheers,

Thasso

--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany

--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany