[Biojava-l] RNATools bug
Sat, 6 Oct 2001 16:30:53 +0100
On Fri, Oct 05, 2001 at 08:06:23PM -0400, Cox, Greg wrote:
> When converting a DNA strand to an RNA strand, RNATools has a hardcoded T ->
> U and returns the symbol otherwise. This breaks if an ambiguous nucleotide
> is passed in, since they don't trip the T check. I looked in the alphabet
> XML file, and there are no ambiguous RNA symbols.
> The use case I'm facing is translating a DNA sequence. The translation in
> BioJava goes through an RNA sequence, so ambiguous residues foul it up.
> So, I propose one of the following solutions:
> * Introduce ambiguous RNA symbols that are analogous to the DNA symbols.
> * Introduce one ambiguous RNA symbol that all ambigous DNA symbols map to.
> * Break the biological parallel and translate DNA directly to amino acids.
> If I don't hear from anyone, I'll do the third.
Can I put in a vote for option 1? I think that's what
was really intended by the current design, and it seems to
me to be the `least surprise' option. Should just mean adding
the relevant bits to AlphabetManager.xml and fixing the DNA->
The current handling of ambiguity symbols is far from wonderful.
I wrote a patchset a while back which handled these in a much
tidier way, and also addressed issues with the current
SymbolParser. I never had time to get this 100% finished,
but I can send a copy to anyone who's interested, or get
it checked in on a CVS branch. What's left is basically:
- Sync up with current source tree (should be quite easy)
- Tidy up parsing of cross-product symbols
- Testing :)
I'll try to get it finished off next week.