[Biojava-dev] SimpleGappedSymbolList problem, wierd "String
seqString()" results.
Matthew Pocock
matthew_pocock at yahoo.co.uk
Thu Feb 20 02:34:34 EST 2003
Eugh. Well spotted. I'll take a look tomorrow.
Matthew
Kalle Näslund wrote:
> Hi!
>
> I noticed that if you insert leading or trailing gaps, and then call the
> seqString() you get "n" instead of "-". To illustrate it a bit better.
> the following set of gap operations on a SimpleGappedSymbolList :
>
>
>
> Alphabet dna = DNATools.getDNA();
> SymbolTokenization dnaParser = dna.getTokenization( "token" );
>
> SymbolList symList1 = new SimpleSymbolList( dnaParser, new
> String( "TTCCTTCCGGGTCGTC" ) );
> GappedSymbolList gl1 = new SimpleGappedSymbolList( symList1 );
>
> System.out.println( gl1.seqString() );
> gl1.addGapsInSource( 1, 4 );
> System.out.println( gl1.seqString() );
> gl1.addGapsInSource( 10, 2 );
> System.out.println( gl1.seqString() );
> gl1.addGapsInSource( 17, 4 );
> System.out.println( gl1.seqString() );
>
> gives this result :
>
> ttccttccgggtcgtc
> nnnnttccttccgggtcgtc
> nnnnttccttccg--ggtcgtc
> nnnnttccttccg--ggtcgtcnnnn
>
>
> I havent manage to fully understand why this happens, but the start of
> the story goes like this :
>
> 1) SimpleGappedSymbolList's symbolAt method returns different gap
> symbols depending on if the gap symbol is an "internal" gap or a
> leading/trailing gap. the relevant piece of code in the symbolAt method
> is :
> if( (indx < firstNonGap()) || (indx > lastNonGap()) ) {
> return Alphabet.EMPTY_ALPHABET.getGapSymbol();
> }
> else {
> return getAlphabet().getGapSymbol();
> }
>
> 2) When one call seqString on a SimpleGappedSymbolList it simple uses
> the method it inherited from AbstractSymbolList,that looks like this.
>
>
> public String seqString() {
> try {
> SymbolTokenization toke =
> getAlphabet().getTokenization("token");
> return toke.tokenizeSymbolList(this);
> }
> catch (BioException ex) {
> throw new BioRuntimeException(ex, "Couldn't
> tokenize sequence");
> }
> }
>
> so, what happens is that all symbols, get fed to the SymbolTokenization
> object, that you get from whatever the default alphabet a DNA
> SimpleGappedSequence uses. if you feed the gapsymbol you get from
> Alphabet.EMPTY_ALPHABET.getGapSymbol() to this SymbolTokenizer it
> returns a "n" and not a "-".
>
> At this point my limited knowledge of the black arts of Alphabets in
> biojava stoped me from writing the end of the story, and was hoping that
> someone else might end it for me =),
>
> regards Kalle
>
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
More information about the biojava-dev
mailing list