[Biojava-dev] phylo code
Jim Balhoff
james.balhoff at duke.edu
Tue Aug 7 14:34:44 UTC 2007
Hi Richard and Thasso,
On Aug 7, 2007, at 3:48 AM, Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Thanks for your feedback Thasso.
>
> The fire/events thing is certainly a misnomer - guilty as charged (I
> wrote the code...) - but I suppose I wasn't expecting the naming to
> matter much. I'll bear that in mind for future code. We can't really
> change the existing interfaces now as they've been released and it is
> not nice to users for us to change public interfaces that might
> already
> be in use.
>
> The PHYLIP format handler was written by Jim Balhoff. Jim - do you
> have
> any responses to Thasso's comments about the output options?
I think it would be great to have import and export classes for
PHYLIP trees and distance matrices. The current code handles only
alignments. The other data would be in separate files, and so not
part of this parser.
> I like the sound of your PHYLIP short-name map. You could
> definitely go
> ahead and contribute an update which implemented that. (Don't
> forget to
> make your code clear the map between one file and the next!)
Yes, I think the the map is a great idea. The first edition of the
PHYLIP parser was simple and strictly stuck to the format
specification. The map would be a great way to transparently use
longer names when running PHYLIP behind the scenes. If the user is
actually exporting a PHYLIP formatted alignment to disk, it might be
nice to have a few options for what should happen - the current
truncation method could be one option, another might be to simply put
in the long name and put a space before the sequence starts (not
strictly PHYLIP, but it is a simple alignment format recognized by
some programs), another might be to raise an exception or otherwise
alert if sequence names are too long.
Another enhancement to the PHYLIP classes would be to let the
developer specify interleaved or sequential alignment format for
import and export (and for both the length of the lines for export).
Right now I think there are some possible files which will not be
parsed correctly - probably a sequential style file with newlines
within the sequences (if a "sequential" alignment has no newlines, it
is equivalent to "interleaved"). Or instead of specifying
interleaved or sequential, figure out how to detect them reliably.
Here are the examples:
<http://evolution.genetics.washington.edu/phylip/doc/
main.html#inputfiles>
Best regards,
Jim
____________________________________________
James P. Balhoff, Ph.D.
National Evolutionary Synthesis Center
2024 West Main St., Suite A200
Durham, NC 27705
USA
More information about the biojava-dev
mailing list