[Biojava-l] Ambiguity codes.

Matthew Pocock mrp@sanger.ac.uk
Wed, 21 Jun 2000 19:48:14 +0100


Eric Blossom wrote:

> Shouldn't this be a problem only for importers (parsers) and exporters of
> data? Shouldn't data models in biojava be able to define amino acids and
> nucleic acids including ambiguous ones independent of alphabet?

There are three issues which traditionaly have been munged together into one
mess. 1) how are ambiguity codes represented in files 2) how do we store an
ambiguity code inside BioJava 3) how do we get between these two representations.

problem 1 is out of our hands (unless you are lucky enough to be making the
files). Problem two is entirely in our hands, and I think is solved with the
introduction of AmbiguitySymbol and the (slight) extension of the Alphabet
definition. Problem three is the job of the SymbolParser objects, which I think
we can cludge for the common cases of DNA & Protein alphabets by priming them
with the obvious mappings (n -> match any DNA residue etc.). I guess we may need
a similar stringifyer object that lets us write out sequences with '.' or '-' or
any other char of choice to represent gaps.


> Eric Blossom              mailto:Eric@BlossomAssociates.Com
> Blossom Associates West   http://www.BlossomAssociates.Com/
>                           510 841-3338

Joon: You're out of your tree
Sam:  It wasn't my tree
                                                 (Benny & Joon)