[Biojava-dev]
SimpleGappedSymbolList problem, wierd "String seqString()" results.
Kalle Näslund
kalle.naslund at genpat.uu.se
Wed Feb 19 15:23:16 EST 2003
Hi!
I noticed that if you insert leading or trailing gaps, and then call the
seqString() you get "n" instead of "-". To illustrate it a bit better.
the following set of gap operations on a SimpleGappedSymbolList :
Alphabet dna = DNATools.getDNA();
SymbolTokenization dnaParser = dna.getTokenization( "token" );
SymbolList symList1 = new SimpleSymbolList( dnaParser, new
String( "TTCCTTCCGGGTCGTC" ) );
GappedSymbolList gl1 = new SimpleGappedSymbolList( symList1 );
System.out.println( gl1.seqString() );
gl1.addGapsInSource( 1, 4 );
System.out.println( gl1.seqString() );
gl1.addGapsInSource( 10, 2 );
System.out.println( gl1.seqString() );
gl1.addGapsInSource( 17, 4 );
System.out.println( gl1.seqString() );
gives this result :
ttccttccgggtcgtc
nnnnttccttccgggtcgtc
nnnnttccttccg--ggtcgtc
nnnnttccttccg--ggtcgtcnnnn
I havent manage to fully understand why this happens, but the start of
the story goes like this :
1) SimpleGappedSymbolList's symbolAt method returns different gap
symbols depending on if the gap symbol is an "internal" gap or a
leading/trailing gap. the relevant piece of code in the symbolAt method is :
if( (indx < firstNonGap()) || (indx > lastNonGap()) ) {
return Alphabet.EMPTY_ALPHABET.getGapSymbol();
}
else {
return getAlphabet().getGapSymbol();
}
2) When one call seqString on a SimpleGappedSymbolList it simple uses
the method it inherited from AbstractSymbolList,that looks like this.
public String seqString() {
try {
SymbolTokenization toke =
getAlphabet().getTokenization("token");
return toke.tokenizeSymbolList(this);
}
catch (BioException ex) {
throw new BioRuntimeException(ex, "Couldn't
tokenize sequence");
}
}
so, what happens is that all symbols, get fed to the SymbolTokenization
object, that you get from whatever the default alphabet a DNA
SimpleGappedSequence uses. if you feed the gapsymbol you get from
Alphabet.EMPTY_ALPHABET.getGapSymbol() to this SymbolTokenizer it
returns a "n" and not a "-".
At this point my limited knowledge of the black arts of Alphabets in
biojava stoped me from writing the end of the story, and was hoping that
someone else might end it for me =),
regards Kalle
More information about the biojava-dev
mailing list