[Biojava-l] question about ambiguous symbols

Keith James kdj at sanger.ac.uk
Wed Feb 5 15:52:03 EST 2003


>>>>> "dion" == dion whitehead <dion.whitehead at uni-bielefeld.de> writes:

    dion> Hello, I am having a frustrating time with attempting to
    dion> read in rna sequences. They contain the 'N' symbol which is
    dion> a standard ambiguity symbol, but the code trips up on this
    dion> every time, saying its not a recognized symbol in the
    dion> alphabet. Do I have to specify it myself?

I got bitten by this too, when porting some of the code. If you look
in biojava-live/resources/org/biojava/bio/symbol/AlphabetManager.xml
you will see that the default RNA alphabet contains only
agcu-~. i.e. no ambiguity symbols at all.

I haven't tested this, but you could hack your AlphabetManager.xml to
include

   <ambiguityMapping token="n">
    <symbolref name="guanine" />
	<symbolref name="adenine" />
	<symbolref name="cytosine" />
	<symbolref name="uracil" />
   </ambiguityMapping>

as in the DNA alphabet. Not sure if this is the best solution - I'm
sure someone will say if it's not.

Keith

-- 

- Keith James <kdj at sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -



More information about the Biojava-l mailing list