[Bioperl-l] AlignIO::* match_char, gap_char and missing_char etc
Nathan Haigh
n.haigh at sheffield.ac.uk
Thu May 12 06:10:23 EDT 2005
I've noticed some inconsistency in the way sequence alignments are read and
stored and printed when match_char, gap_char and missing_char are used.
Should sequences be stored exactly the way they are represented in the
file? Should there be default values for formats that support one or more of
match_char, gap_char and missing_char or should these only be set if they
are used in the alignment file? Should formats that don't support match_char
check for and do an unmatch during a write_aln? Should formats that use
specific characters for match_char, gap_char and missing_char check and do
map_char if required during a write_aln?
I was going to have a look through Align::* and try to make them more
consistent with regards to these. What I propose to do is:
1) Have default values for match_char, gap_char and missing_char for
those formats that only support a particular character
2) Have match_char, gap_char and missing_char set when the appropriate
command is found for setting these characters
3) Store the sequences exactly as they are in the alignment file
(except maybe for match_char)
4) During write_aln check are conducted to ensure the sequences are
compliant with the features (match_char, gap_char and missing_char )
supported by that format and do map_char, unmatch/match as required.
I suppose the only thing is whether Unmatch should be called during read_aln
in order to store sequences with the correct residue characters instead of
the match_char. The reason being that many formats don't support this and
the user can always call "match" on the SimpleAlign object, thus bringing
some level of consistency to the use of this feature.
This will be my first foray into making bigger changes in Bioperl as a
developer! Yikes! So I'd like to know what people think as well as their
experiences with similar problems. I'm most familiar with nexus, clustal,
phylip and fasta so it would be nice to hear about comments/problems with
some of the other formats!
Cheers
Nath
----------------------------------
Nathan Haigh
PostDoctoral Research Associate
Department of Animal and Plant Sciences
University of Sheffield
Western Bank
Sheffield
S10 2TN
Tel: +44 (0)114 22 20112
Mob: +44 (0)7742 533 569
Fax: +44 (0)114 22 20002
More information about the Bioperl-l
mailing list