[Biojava-l] String for Symbol

Emig, Robin Robin.Emig@maxygen.com
Thu, 14 Dec 2000 13:05:21 -0800


-----Original Message-----
From: Thomas Down [mailto:td2@sanger.ac.uk]
Sent: Thursday, December 14, 2000 12:21 PM
To: Emig, Robin
Cc: 'Matthew Pocock'; 'biojava-l@biojava.org'
Subject: Re: [Biojava-l] String for Symbol


On Thu, Dec 14, 2000 at 09:55:07AM -0800, Emig, Robin wrote:
> 	I would like to add a method for BasisSymbol
> 
> String getTokensString()
> or 
> String getTokens()
> 
> which would return a string consisting of all the tokens for that symbol.
> This would be useful for actual crossproduct symbols such as  [act]. Right
> now the get token returns a unique but uncomprehendable char for getToken,
> and in order to get the string of tokens you have to iterate through a
list
> from getSymbols(). It was suggested that and additional method of String
> getToken() be implemented, but I thought it would be better if the two
> methods where less ambiguous.
> 
> Any comments....

I can see the point of this, and would be inclined to agree
(although why only BasisSymbol and not Symbol) except...
---Actually it would be better as part of the BasisSymbol interface, I agree

I've been kind-of suspicious of the idea of tokens attached
to Symbols.  They're useful fo the simple protein and DNA
alphabets, but otherwise rather problematic.
------ I can see a huge use for them in Simple Cross ProductAlphabets(ie
codons), DNA and Protein Alphs.
  As a result,
I'd like to remove the tokens from the symbols themselves, and
instead have `Stringifier' objects that are responsible for
encapsulating code to convert Symbols and SymbolLists into
textual representation -- in other words the exact opposite
of SymbolParser objects.

----- I like this idea as of a StringifierObject, but for more complex
symbol objects. If it would work just like the SymbolParser Objects, in fact
it might make things simpler to just use the same object. How about this...
1.1 release
add String getTokens() method to BasisSymbol interface
>1.1 release
keep the getTokens() and getToken() methods
add a tag for DefaultStringifier/Parser in the xml doc
when getTokens(), is called, it uses that stringifier object to parse itself
this is kinda kludgy, but also allows biojava to be simplier for simple
things

We'd still want to leave the getName() method on symbols,
since they're very useful for debugging. (note, I've
fixed this so it now always gives a sensible answer for
ambiguous and crss-product symbols).

I was planning to leave this change 'til after the 1.1
release, although I guess it could be bought forward.
But I guess this could be reconsidered if you need more
sophisticated stringification behaviour now.

Happy hacking,

  Thomas.
-- 
``If I was going to carry a large axe on my back to a diplomatic
function I think I'd want it glittery too.''
           -- Terry Pratchett