[EMBOSS] Segmentation fault with multiple similarity matricies in fneighbor

Sat Jun 2 00:08:40 UTC 2007

Dear List,

Hazel Hartman Jenkins wrote:

[corrected]
> If I run the following command;
> fneighbor -datafile tinytest.dat -replicates y -outfile filefrom.fnb
> then everything works.
> 
> If, however, my tinytest.dat contains two  similarity matricies (or, for 
> that matter, the one hundred bootstrap replicates written by fdnadist by 
> default), like this;
>     3
> 1187Aquife  0.000000  0.368385  0.404489
> BB213b06    0.368385  0.000000  0.151182
> BB269b06    0.404489  0.151182  0.000000
>     3
> 1187Aquife  0.000000  0.368385  0.404489
> BB213b06    0.368385  0.000000  0.151182
> BB269b06    0.404489  0.151182  0.000000
> 
> then fneighbor returns;
> <quote>
> Phylogenies from distance matrix by N-J or UPGMA method
> Segmentation fault
> <endquote>

fneighbour  (and ffitch and fkitsch - they also have this bug) should 
definitely support multiple input matrices, as the original Phylip 
routines do. It is a very desirable trait because it is needed to create 
bootstrap values for trees built from distance matrix data.

The desired behaviour is for fneighbor (and ffitch and fkitsch) to accept 
input files containing multiple distance matrices and produce multiple 
trees from them, in standard nested-parenthesis notation, which can then 
be read by fconsense. 

The reading should not stop at the end of the first distance matrix, or the 
fault will become silent, and the user familiar with Phylip may not notice 
that the extra matrices have been dropped until many processing steps 
later. 

I'll describe why it should work that way in a little more detail by 
describing the way in which I've used the functionality.

The first step in making a tree with bootstrap values is to create multiple  
pseudo-sequences assembled from random samples (with replacement) of the 
genetic sequences you want to make into a tree. By default, both Seqboot 
(Phylip) and fseqboot (EMBASSY) give one hundred pseudo-sequences.

The next step is to make one hundred slightly different trees Some methods 
build trees directly from the sequence data. The methods implemented by 
Neighbor, Fitch, and Kitsch all build trees from distance matrices. So 
first you have to make the hundred distance matrices.

The distance matrices are calculated from the sequence data using DNAdist. 
In EMBASSY, fdnadist calculates one hundred distance matrices from the 
hundred pseudo-sequence datasets faultlessly.

Now comes the problem. In Phylip you can feed the hundred-distance-matrices 
output from DNAdist directly into Neighbor (or Fitch or Kitsch), and build 
your one hundred trees in one command. EMBASSY currently will only build 
one at a time; this is inconvenient. 

The last step feeds the file containing 100 trees into Consense. Consense 
to labels each possible subtree (group all on one branch) with the number 
(percentage) of subsamples which include it. You now have bootstrap values 
ready to tag onto a tree (which is calculated separately from /all/ of the 
sequence data). 

I'm afraid I don't know of anyone else using EMBOSS Phylip, but if I can 
get it to work I'll pass my script along with my recommendation. I find it 
easier to script than Phylip.

Please e-mail me with any questions, or for specific Phylip/EMBASSY 
scripts. I have some knowledge of C++, and I'm willing to help with the 
coding; but I warn that I'm new to development.

Regards, 
Hazel  Jenkins 
<hjenkins at uvic.ca>