[EMBOSS] Segmentation fault with multiple similarity matricies in fneighbor
Hazel Hartman Jenkins
hjenkins at uvic.ca
Sat Jun 2 00:08:40 UTC 2007
Dear List,
Hazel Hartman Jenkins wrote:
[corrected]
> If I run the following command;
> fneighbor -datafile tinytest.dat -replicates y -outfile filefrom.fnb
> then everything works.
>
> If, however, my tinytest.dat contains two similarity matricies (or, for
> that matter, the one hundred bootstrap replicates written by fdnadist by
> default), like this;
> 3
> 1187Aquife 0.000000 0.368385 0.404489
> BB213b06 0.368385 0.000000 0.151182
> BB269b06 0.404489 0.151182 0.000000
> 3
> 1187Aquife 0.000000 0.368385 0.404489
> BB213b06 0.368385 0.000000 0.151182
> BB269b06 0.404489 0.151182 0.000000
>
> then fneighbor returns;
> <quote>
> Phylogenies from distance matrix by N-J or UPGMA method
> Segmentation fault
> <endquote>
fneighbour (and ffitch and fkitsch - they also have this bug) should
definitely support multiple input matrices, as the original Phylip
routines do. It is a very desirable trait because it is needed to create
bootstrap values for trees built from distance matrix data.
The desired behaviour is for fneighbor (and ffitch and fkitsch) to accept
input files containing multiple distance matrices and produce multiple
trees from them, in standard nested-parenthesis notation, which can then
be read by fconsense.
The reading should not stop at the end of the first distance matrix, or the
fault will become silent, and the user familiar with Phylip may not notice
that the extra matrices have been dropped until many processing steps
later.
I'll describe why it should work that way in a little more detail by
describing the way in which I've used the functionality.
The first step in making a tree with bootstrap values is to create multiple
pseudo-sequences assembled from random samples (with replacement) of the
genetic sequences you want to make into a tree. By default, both Seqboot
(Phylip) and fseqboot (EMBASSY) give one hundred pseudo-sequences.
The next step is to make one hundred slightly different trees Some methods
build trees directly from the sequence data. The methods implemented by
Neighbor, Fitch, and Kitsch all build trees from distance matrices. So
first you have to make the hundred distance matrices.
The distance matrices are calculated from the sequence data using DNAdist.
In EMBASSY, fdnadist calculates one hundred distance matrices from the
hundred pseudo-sequence datasets faultlessly.
Now comes the problem. In Phylip you can feed the hundred-distance-matrices
output from DNAdist directly into Neighbor (or Fitch or Kitsch), and build
your one hundred trees in one command. EMBASSY currently will only build
one at a time; this is inconvenient.
The last step feeds the file containing 100 trees into Consense. Consense
to labels each possible subtree (group all on one branch) with the number
(percentage) of subsamples which include it. You now have bootstrap values
ready to tag onto a tree (which is calculated separately from /all/ of the
sequence data).
I'm afraid I don't know of anyone else using EMBOSS Phylip, but if I can
get it to work I'll pass my script along with my recommendation. I find it
easier to script than Phylip.
Please e-mail me with any questions, or for specific Phylip/EMBASSY
scripts. I have some knowledge of C++, and I'm willing to help with the
coding; but I warn that I'm new to development.
Regards,
Hazel Jenkins
<hjenkins at uvic.ca>
More information about the EMBOSS
mailing list