[Biojava-l] Symbols are 1 Char?

Thomas Down td2@sanger.ac.uk
Tue, 7 Nov 2000 12:06:13 +0000


On Mon, Nov 06, 2000 at 01:00:08PM -0800, Emig, Robin wrote:
>
> 	I am trying to create a translation program that is based off of a
> codon bias table. I am having a little trouble actually creating the class
> though because I thought I'd create it as follows
> 
> a Class with the following members
> SimpleDistribution (where the alphabet is DNA codons)
> Translation Table (where one alphabet is codons and the other is AA's)
> The problem is that the alphabets (built from symbols) are only 1 char
> elements, ie I can't represent ATG as a symbol. Am I missing something, is
> there a way to have a symbol be multiple chars? Even the interface defines
> it as a char.

Hi...

BioJava Symbol objects certainly aren't tied to representing
a single `char'.  There is a convenience method, getToken(),
which returns a char, but there isn't a requirement that this
be anything meaningful (checks documentation -- yes, looks like
to documentation of getToken() could do with some clarifications...)

The easy way to represent codons is to use a cross-product
alphabet.  This is an ordered list of `child' alphabets, and
contains symbols which are ordered lists of symbols from
these child alphabets.  So you can do something like:

  // Generate the alphabet DNA x DNA x DNA

  CrossProductAlphabet codonAlphabet = AlphabetManager.
          getCrossProductAlphabet(Collections.nCopies(3, DNATools.getDNA());

  // Obtain a specific symbol from the codon alphabet

  List baseList = new ArrayList();
  baseList.add(DNATools.a());
  baseList.add(DNATools.t());
  baseList.add(DNATools.g());
  Symbol startCodon = codonAlphabet.getSymbol(baseList);


You can do all the normal tricks with a cross-product alphabet,
including constructing a distribution, and using it to store
your codon bias table.

If you call the `getToken' method on symbols in the codon alphabet,
you'll get a unique (but not meaningful) char.  On the other hand,
getName() will return a sensible string representation of the
ordered list.

Hope this helps,

   Thomas.
-- 
One of the advantages of being disorderly is that one is
constantly making exciting discoveries.
                                       -- A. A. Milne