[Bioperl-l] extending the PHYLIP format
Heikki Lehvaslaiho
heikki at sanbi.ac.za
Wed May 28 08:23:51 UTC 2008
I just learned that a number of phylogenetics packages (PAUP, PHYML, Mr Bayes
at least ) now allow longer than 10 character IDs in PHYLIP format. The
documentation is scarce but the rules seem to be:
1. There can be spaces before the ID.
2. The ID can be up to 50 characters long.
3. ID can contain any characters. If you are using spaces within the ID, you
have to put the whole ID in single quotes ('). Single quotes can be used for
all IDs and are removed when parsing in.
4. It is customary to have two spaces between the ID and the sequence.
This custom seems to have come into PHYLIP format from Nexus.
Note that this allows sequences in a file to start at different columns.
Can anyone shed more light into matter?
I need to get this into bioperl as the names in HIV sequences that I work with
are very long and can not be sensibly truncated.
What would be the best way to do this?
1. Add more options to the already heavily
hacked Bio::AlignIO::phylip.pm
2. Create a Bio::AlignIO::phyliplong.pm
Do those ugly hacks for supporting fixed length long IDs really really belong
in the vanilla phylip.pm file?
Opinions?
-Heikki
--
______ _/ _/_____________________________________________________
_/ _/
_/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za
_/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho
_/ _/ _/ SANBI, South African National Bioinformatics Institute
_/ _/ _/ University of Western Cape, South Africa
_/ Phone: +27 21 959 2096 FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list