[Biojava-l] Rooted trees in nexus files
Thasso Griebel
thasso.griebel at uni-jena.de
Tue Nov 3 17:58:14 UTC 2009
Hi,
> On 2 Nov 2009, at 23:11, Tiago Antão wrote:
>
>> 2009/11/2 Richard Holland <holland at eaglegenomics.com>:
>>> In the meantime, the JGraph library which is used for displaying
>>> JGraphT
>>> graphs in a visual form does include root-finding methods, so
>>> maybe you
>>> could investigate there to see if any of the existing functions
>>> might help?
>>
>> Did that. None can help as the graph is not directed (it would be
>> trivial with a directed graph ,of course).
>> In the current form, the nexus parser is of limited use for tree
>> information:
>> 1. For rooted trees it has a bug has it doesn't say what is the root
>
> The Newick strings used in the Nexus format are themselves
> undirected graphs. They don't specify which node is the root, which
> means it must be determined by computation after parsing the string.
> I'm unsure of the algorithm to use to do this. If there are people
> on this list who know the algorithm and have time to code it up,
> volunteers would be welcome.
There is a way to uniquely get a root from a newick string. Usually a
rooted newick is surrounded with brackets, which indicates the root as
the highest node in the tree. For example:
(A, (B,C))
describes a tree rooted between "A" and the clade (B,C), and with the
surrounding brackets this is unique.
In nexus the situation might be a bit different. nexus allows you to
prefix the newick string with [&R] or [&U] to indicate rooted/unrooted
trees. For example:
tree treename = [&R] ((A,(B,C)),(D,E));
is a valid rooted nexus tree where the root is placed between the
clades [A.B,C] and [D,E], although in this example the newick is
surrounded with brackets and rooted uniquely by itself.
>> 2. For unrooted trees, sometimes the "root" (what the user perceives
>> as root) is interesting information.
>
> What the user perceives as root in an unrooted tree could be
> different for every user, so it would be hard to provide a standard
> function to read their mind! However if everyone can come up with a
> commonly agreed way of determining the most likely root
> computationally, it would be interesting to add this as a feature,
> with the caveat that it is only a best-effort approximation as the
> original tree is unrooted.
BioNJ implements multiple methods to determine a root in a neighbor-
joining tree. I can look it up, but I think the most common ways to
compute the root are: try to place the root in the "middle" such that
your tree is balanced and you have equal number of leaves to both
sides of the tree. The other method I remember is based on the edge
weights. Basically you find the longest path between two leaves and
place the root in the middle of that path (based on the path length).
I think the most common way though is to specify an outgroup node and
place the root on the path between that outgroup and its successor. I
am not sure if the outgroup can be described in nexus somehow.
I would also suggest to generally parse trees as rooted trees (maybe
jsut for th initial internal model). Creating an unrooted tree from a
rooted one is easy, remove the root and forget about directions. The
other way might be hard and ambiguous.
cheers,
Thasso
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
More information about the Biojava-l
mailing list