[Biojava-l] Rooted trees in nexus files

Tiago Antão tiagoantao at gmail.com
Fri Nov 6 11:30:00 UTC 2009


I've done a few changes to TreesBlock, namely implementing a version
of what was talked here:

1. I maintained getTreeAsJGraphT and getTreeAsWeightedJGraphT as they
are in terms of interface
2. There is now a new method getTopNode, stating which node is on the
"top". I use the name getTopNode and not getRootNode to avoid
misleading users: only rooted trees have a root, but for the nexus
type of representation all have a "top" (which in rooted trees is the
root)
3. There exist now setNodePrefix and getNodePrefix to be able to
change the prefix (which defaults to p, as before)

In my view these changes solve both problems: The issue with node
names and the need to know the root/top of a nexus tree. It might not
be the best solution, but it gets things on the right track without
taking too much of my time. There are also no changes to the
signatures of existing methods

Now, there is still a problem:
addTree(final String label, UndirectedGraph<String, DefaultEdge> treegraph)
Is highly dependent on the p* convention for internal nodes.
Here I would be tempted to change the method signature to:
addTree(final String label, UndirectedGraph<String, DefaultEdge>
treegraph, String topNode)

Interestingly there is no addTree with weighted graphs (for distances).

If nobody sees a problem with this, I will change addTree.

I will then attach a patch to the currently open bug (along with test
cases). And it should be done.

2009/11/4 Andreas Prlic <andreas at sdsc.edu>:
> excellent, thanks for taking this on!
> Andreas
>
> 2009/11/4 Tiago Antão <tiagoantao at gmail.com>
>>
>> Unless anyone with experience in biojava development wants to take on
>> this, I would volunteer to do this. I ended up using the PhyloXML
>> forester-atv parser (and moving to phyloxml instead of nexus), but as
>> I reported this, I might as well sort it out...
>>
>> 2009/11/4 Richard Holland <holland at eaglegenomics.com>:
>> > ah... except a problem! The parser does not know all names in the string
>> > in
>> > advance, so if it auto-assigns one that is then used later in the
>> > string, we
>> > have the same problem with name clashes as before.
>> >
>> > The names the parser assigns cannot totally avoid all clashes unless it
>> > has
>> > already parsed the string to find out what names were used in the string
>> > itself already. So some kind of pre-parse would be necessary.
>> >
>> > On 4 Nov 2009, at 12:46, Richard Holland wrote:
>> >
>> >> Sounds good.
>> >>
>> >> On 4 Nov 2009, at 12:40, Tiago Antão wrote:
>> >>
>> >>> 2009/11/3 Richard Holland <holland at eaglegenomics.com>:
>> >>>>
>> >>>> The prefix for the parser currently is hardcoded as p. Two new
>> >>>> methods -
>> >>>> set
>> >>>> and getDefaultPrefix which accept a string should be provided (it
>> >>>> should
>> >>>> check that the string is valid, i.e. all alphanumeric and with no
>> >>>> spaces
>> >>>> or
>> >>>> other Newick-sensitive characters). The parser should be changed to
>> >>>> use
>> >>>> the
>> >>>> output from getDefaultPrefix() instead of the hardcoded p. The
>> >>>> default
>> >>>> behaviour should be such that it behaves the same as at present
>> >>>> unless
>> >>>> the
>> >>>> user explicitly says otherwise by calling the setDefaultPrefix()
>> >>>> method.
>> >>>
>> >>> This default behavior would still raise an exception with nodes called
>> >>> p* . I would suggest a minor change: If there is a clash, the parser
>> >>> would try the next p* (or whatever defaultPrefix) ...
>> >>>
>> >>> Example to make it clear: if there is a leaf called p2, internal nodes
>> >>> generated would be p1, p3, p4, ....
>> >>>
>> >>> --
>> >>> "The hottest places in hell are reserved for those who, in times of
>> >>> moral crisis, maintain a neutrality." - Dante
>> >>
>> >> --
>> >> Richard Holland, BSc MBCS
>> >> Operations and Delivery Director, Eagle Genomics Ltd
>> >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> >> http://www.eaglegenomics.com/
>> >>
>> >>
>> >> _______________________________________________
>> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>> > --
>> > Richard Holland, BSc MBCS
>> > Operations and Delivery Director, Eagle Genomics Ltd
>> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> > http://www.eaglegenomics.com/
>> >
>> >
>>
>>
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>



-- 
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante




More information about the Biojava-l mailing list