[Biojava-l] SimpleGappedSymbolList from a String

Matthew Pocock matthew_pocock at yahoo.co.uk
Fri May 14 06:25:16 EDT 2004


Don Naki wrote:

>Hi all,
>I have a couple of 'novice' questions...
>
>I can't seem to figure out how to create a SimpleGappedSymbolList from a String. I want to parse "-AQSD--VP-" and create a SimpleGappedSymbolList from it.
>ProteinTools has methods to return a SymbolList, Sequence, and GappedSequence from a String, but not a GappedSymbolList. I understand GappedSequence extends GappedSymbolList, but I want just the GappedSymbolList. Alternatively, is there a way to get a GappedSymbolList from a GappedSequence?
>  
>
We could add a uitlity method to do this. Why do you /have/ to have a 
GappedSymbolList that is not a GappedSequence? Is there a specific 
memory constraint?

>A second question is that ProteinTools.createGappedProteinSequence("-AQSD--VP-").seqString() results in the String "XAQSD--VPX". The first and last '-' characters are now represented by 'X'. Is this a special kind of gap symbol? If so, how can I distinguish between '-' and 'X' Symbols?
>  
>
This is a tokenization bug - the leading/trailing gaps are not being 
recognised by the tokenizer, and then replaced by X. It's probably in 
CharacterTokenization - needs a special-case for 
AlphabetManager.getGapSymbol() - could someone look a this?

>Thanks in advance,
>Don
>  
>
Matthew


More information about the Biojava-l mailing list