[Biojava-dev] More problems with gaps

Thomas Down td2 at sanger.ac.uk
Fri May 2 12:29:06 EDT 2003


On Mon, Apr 28, 2003 at 01:12:29PM +0100, Lachlan Coin wrote:
> 
> I've been tracking down a bug to do with the use of gaps in cross product
> alphabets.
> The problem is that a gap symbol is not atomic, and hence when I use a cross product
> alphabet, and I use the getSymbol(List syms) method of this cross product
> alphabet, if any of the symbols in syms are gaps, the method returns a
> SimpleSymbol with name "[]" (it should return a BasisSymbol).

There are two different types of gap symbol in biojava: a global
gap symbol, which doesn't really fit into any alphabet and
is available via:

    Symbol globalGapSymbol = AlphabetManager.getGapSymbol();

Then there are gap symbols which actual span particular
alphabet-spaces.  The easiest way to get hold of these
is to use the getGapSymbol() method on the alphabet with
which you're actually working.

The following code works as expected:

        Alphabet proteinAlpha = ProteinTools.getAlphabet();
        Symbol aProteinSymbol = (Symbol) ((FiniteAlphabet) proteinAlpha).iterator().next();
        Symbol aProteinGap = proteinAlpha.getGapSymbol();
        Alphabet proteinSquared = AlphabetManager.getCrossProductAlphabet(Collections.nCopies(2, proteinAlpha));
        
        List s = new ArrayList();
        s.add(aProteinGap);
        s.add(aProteinSymbol);
        Symbol proteinVsGapSymbol = proteinSquared.getSymbol(s);
        System.out.println(proteinVsGapSymbol.getName());
        
        List l = ((BasisSymbol) proteinVsGapSymbol).getSymbols();
        System.out.println(s.get(0) == l.get(0));
        System.out.println(s.get(1) == l.get(1));

I'll go back and fix your test case in a minute.


The practical distinction between the two types of gap is subtle.
The gaps which are inserted into a sequence alignment to represent
indels are the alphabet-specific gaps.  Those are almost always
want you want in practice.

We definitely need some better javadoc here.  Matthew's
written about the gap symbols in his PhD thesis, and it
might be worth re-using some text from that.

    Thomas.


More information about the biojava-dev mailing list