[Biojava-l] RNATools bug

Thomas Down td2@sanger.ac.uk
Sat, 6 Oct 2001 16:30:53 +0100


On Fri, Oct 05, 2001 at 08:06:23PM -0400, Cox, Greg wrote:
> When converting a DNA strand to an RNA strand, RNATools has a hardcoded T ->
> U and returns the symbol otherwise.  This breaks if an ambiguous nucleotide
> is passed in, since they don't trip the T check.  I looked in the alphabet
> XML file, and there are no ambiguous RNA symbols.  
> 
> The use case I'm facing is translating a DNA sequence.  The translation in
> BioJava goes through an RNA sequence, so ambiguous residues foul it up.
> 
> So, I propose one of the following solutions:
> 
> * Introduce ambiguous RNA symbols that are analogous to the DNA symbols.  
> 
> * Introduce one ambiguous RNA symbol that all ambigous DNA symbols map to.
> 
> * Break the biological parallel and translate DNA directly to amino acids.
> 
> If I don't hear from anyone, I'll do the third.

Can I put in a vote for option 1?  I think that's what
was really intended by the current design, and it seems to
me to be the `least surprise' option.  Should just mean adding
the relevant bits to AlphabetManager.xml and fixing the DNA->
RNA translater.

The current handling of ambiguity symbols is far from wonderful.
I wrote a patchset a while back which handled these in a much
tidier way, and also addressed issues with the current 
SymbolParser.  I never had time to get this 100% finished,
but I can send a copy to anyone who's interested, or get
it checked in on a CVS branch.  What's left is basically:

  - Sync up with current source tree (should be quite easy)

  - Tidy up parsing of cross-product symbols

  - Testing :)

I'll try to get it finished off next week.

    Thomas.