[Biojava-l] adding X to the DNA alphabet

David Waring dwaring@u.washington.edu
Fri, 18 Jan 2002 14:34:05 -0800


I know that this discussion has come up before and it seems that people
generally agreed that it would be OK to add X to the dna ambiguity symbol
list. I certainly need it in my work because I deal with sequence files
generated by other programs that use X.

In the old IO model is was easy enough to modify the AlphabetManager.xml
file so I did and have not worried about it for months. Well with the new
model it is not so easy. As best I can tell there can only exist one
ambiguity symbol for and set of bases. So you can not have both n and x act
as symbols for AGCT. So if you add x as agct n will no longer work. If you
add x as agc, v will no longer work (last one in the XML file wins). I'm
guessing that there is a Map somewhere, though I have not found it.

I have temporarily gotten around it by just wiping out my 'b'. Since I
really don't worry about ambiguity in my DNA much but must be able to read
X. But does anyone see a proper solution that would give let us use X?

David