[Bioperl-l] Re: Comparative genomics
Daniel Barker
db2@sanger.ac.uk
Fri, 28 Sep 2001 14:43:16 +0100 (BST)
> There are several matrix formats. The one I used/prefer was #NEXUS
> format - used by several of the best pylogenetic (ie cladistic)
> reconstruction programs like Paup. It has the facility to embed a
> complete analysis configuration, data and the output in nested tree
> descriptions.
I prefer the more programmer-friendly PHYLIP formats, which are
well-established and a sort of a "lowest common denominator" in that most
phylogeny programs can at least import and export them. Also, I slightly
disapprove of the way way Nexus lets you mix alignments, analysis and
trees in the same file.
This is just personal preference though: one could argue it either way.
(And PHYLIP too lets you put some options in its tree and data files,
though I think this is going to be disallowed in a later version.)
P.S. In my phylogeny program LVB, written in C, I chose to represent a
tree as an array structures:
#define UNSET (-1) /* value of integral vars when unset */
typedef int Branchno; /* branch no. (array offset, count or UNSET) */
typedef int Objno; /* object no. (array offset, count or UNSET) */
/* branch of tree */
typedef struct
{
Branchno parent; /* parent branch number, UNSET in root */
Branchno left; /* child 1 number */
Branchno right; /* child 2 number */
Objno object; /* object number if leaf, otherwise UNSET
*/
} Branch;
And the index of the root node ("root branch") was stored separately in an
integer.
I'm not sure if this is any use, and actually I would omit the "object"
field now: one can ensure objects (i.e., sequences) 0..n-1 are permanently
associated with branches 0..n-1 in the array. I think some other programs
do that.
PHYLIP: http://evolution.genetics.washington.edu/phylip.html
LVB: http://www.icmb.ed.ac.uk/lvb/sokal.html
"LVB version 1.0A 18 August 1997, with Extension 1 written by Daniel
Barker, May 1998" can parse trees from file, but if I were writing this
now, I would definitely re-use PHYLIP's source code. This is probably
worthwhile wherever one is working in C. I don't know if it would make
sense for SQL. (Also, check PHYLIP's re-use conditions. They're fairly
unrestrictive, but not GNU-like.) Could you just "wrap" the relevant bits
of PHYLIP?
--
Daniel Barker.