[Biojava-l] Rooted trees in nexus files

Thasso Griebel thasso.griebel at uni-jena.de
Wed Nov 4 11:57:45 UTC 2009


Hi,

> A getRoot() function sounds good. It would return the String label  
> of the root node, the same as which identifies the corresponding  
> vertex in the JGraphT model. An equivalent setRoot() would be nice.

Though you have to keep in mind that switching the root to another  
node has certain implications on the tree structure and this has to be  
taken into account when the newick string is parsed and the graph is  
created. You have to parse the graph from newick and then "reroot" the  
tree as the root might not be equal to the one specified in the newick  
string.

> Personally I would also alter the methods that return JGraphTs so  
> that they return their Directed equivalents if possible. I believe  
> that these can still be unrooted - you'd have to check the JGraphT  
> documentation to make sure.

You have to change that method signature if you want to use the same  
method. The only relationship between JGraphTs UndirectedGraph and the  
DirectedGraph counterpart is that they both extend the Graph  
interface, but a DirectedGraph is not an UndirectedGraph. Switching to  
DirectedGraph definitely breaks the current API ! I don't know how you  
usually handle such situations in BioJava, but this clearly breaks  
compatibility. Maybe it would be better to introduce a new method that  
returns directed graphs ?

cheers,
-thasso






>
> Richard.
>
> On 3 Nov 2009, at 18:55, Tiago Antão wrote:
>
>> But the point is that the class interface changes to the outside  
>> user:
>> 1. How does one report back the root to the user?
>> 2. Regarding the prefix stuff, should the user be allowed to  
>> specify a
>> preferred prefix?
>>
>> Both this things imply interface changes visible to users.
>> If you still need volunteers to do the change, I can do it. But I  
>> need
>> to know what changes to the user interface are to be done.
>> For 1, maybe a method getRoot, returning a string with the name of  
>> the
>> root node?
>> For 2, maybe an extended version of the parse function with a suffix
>> as input parameter?
>>
>> 2009/11/3 Richard Holland <holland at eaglegenomics.com>:
>>>> 1. Lack of knowledge of root node
>>>
>>> The Newick tree string is read as-is and is not parsed. It only  
>>> gets parsed
>>> at the point of conversion to a Undirected or WeightedGraph inside  
>>> the
>>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>>> methods). It's at this point the string is parsed and it's here  
>>> that root
>>> note determination should take place. It's already known whether  
>>> &R or &U
>>> have been specified here, which should help the code work out what  
>>> to do.
>>>
>>>> 2. The p* stuff.
>>>
>>> Exactly the same part of the code as described above. Wherever it  
>>> pushes
>>> values to the stack but prepends them with 'p' first, you'll need  
>>> to change
>>> the 'p' to some instance variable and provide a getter/setter to  
>>> change it,
>>> with 'p' being the default setting.
>>>
>>> cheers,
>>> Richard
>>>
>>>>
>>>> Tiago
>>>> 2009/11/3 Richard Holland <holland at eaglegenomics.com>:
>>>>>
>>>>> Agreed that there is a bug. Now all we need is someone to go in  
>>>>> and fix
>>>>> it!
>>>>> :)
>>>>>
>>>>> cheers,
>>>>> Richard
>>>>>
>>>>> On 3 Nov 2009, at 18:16, Tiago Antão wrote:
>>>>>
>>>>>> 2009/11/3 Thasso Griebel <thasso.griebel at uni-jena.de>:
>>>>>>>
>>>>>>> There is a way to uniquely  get a root from a newick string.  
>>>>>>> Usually a
>>>>>>> rooted newick is surrounded with brackets, which indicates the  
>>>>>>> root as
>>>>>>> the
>>>>>>> highest node in the tree. For example:
>>>>>>>
>>>>>>> (A, (B,C))
>>>>>>>
>>>>>>
>>>>>> Agree, it is quite easy to get the root of the tree from the  
>>>>>> newick
>>>>>> representation. But it should be done on parsing and returned  
>>>>>> in some
>>>>>> way by the parsing system. If the user has to do it again, it  
>>>>>> means
>>>>>> that the user has to parse it again just to know the root node.
>>>>>>
>>>>>>> I would also suggest to generally parse trees as rooted trees  
>>>>>>> (maybe
>>>>>>> jsut
>>>>>>> for th initial internal model). Creating an unrooted tree from  
>>>>>>> a rooted
>>>>>>> one
>>>>>>> is easy, remove the root and forget about directions. The  
>>>>>>> other way
>>>>>>> might
>>>>>>> be
>>>>>>> hard and ambiguous.
>>>>>>
>>>>>> 100% agree.
>>>>>> The newick _representation_ always has a root by virtue of the  
>>>>>> way it
>>>>>> is done. If that root has meaning or not depends. Doing as you  
>>>>>> suggest
>>>>>> seems the most reasonable idea.
>>>>>> I would add that even if it is an unrooted tree, the topology  
>>>>>> might be
>>>>>> of interest. In my case I am doing a comparative visualizer and  
>>>>>> it
>>>>>> might be nice for the user to be able to visualize the topology  
>>>>>> as
>>>>>> specified. It has no biological meaning, but in practice, for  
>>>>>> many
>>>>>> users, it helps.
>>>>>> I note that PhyloXML (even by virtue of being a XML format)  
>>>>>> always
>>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>>> attribute rooted which can be true or false.
>>>>>>
>>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>>> current parser, for rooted trees, does not allow to determine  
>>>>>> where is
>>>>>> the root. I think that there would be a consensus that that is  
>>>>>> a bug?
>>>>>>
>>>>>> Tiago
>>>>>
>>>>> --
>>>>> Richard Holland, BSc MBCS
>>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>>> http://www.eaglegenomics.com/
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "The hottest places in hell are reserved for those who, in times of
>>>> moral crisis, maintain a neutrality." - Dante
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>
>>
>>
>> -- 
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>

--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany







More information about the Biojava-l mailing list