[Biojava-l] String for Symbol

Matthew Pocock mrp@sanger.ac.uk
Fri, 15 Dec 2000 14:35:18 +0000


Hi Robin,

This is all my fault (to some extent). During THE GREAT SYMBOL SHAKEUP, I
accidentaly stripped out the clever getName methods that printed out useful
information like what the [agc] triplet was composed from. Thomas has now put
this back in (thanks). As Thomas said, there is the possibility of moving this
whole token-for-symbol mess out into a seperate interface that would look a lot
like the current parser objects. This way, a single alphabet could have multiple
external representations. At this point, we could drop the getToken method all
together (getName is useful during debugging at the very least).

Any move towards depricating getToken or adding the tokening interface are not
things we should tackle before 1.1 is out of the door. Does the new behaviour of
the getName method give you the information that you require?

Matthew

"Emig, Robin" wrote:

> -----Original Message-----
> From: Thomas Down [mailto:td2@sanger.ac.uk]
> Sent: Thursday, December 14, 2000 12:21 PM
> To: Emig, Robin
> Cc: 'Matthew Pocock'; 'biojava-l@biojava.org'
> Subject: Re: [Biojava-l] String for Symbol
>
> On Thu, Dec 14, 2000 at 09:55:07AM -0800, Emig, Robin wrote:
> >       I would like to add a method for BasisSymbol
> >
> > String getTokensString()
> > or
> > String getTokens()
> >
> > which would return a string consisting of all the tokens for that symbol.
> > This would be useful for actual crossproduct symbols such as  [act]. Right
> > now the get token returns a unique but uncomprehendable char for getToken,
> > and in order to get the string of tokens you have to iterate through a
> list
> > from getSymbols(). It was suggested that and additional method of String
> > getToken() be implemented, but I thought it would be better if the two
> > methods where less ambiguous.
> >
> > Any comments....
>
> I can see the point of this, and would be inclined to agree
> (although why only BasisSymbol and not Symbol) except...
> ---Actually it would be better as part of the BasisSymbol interface, I agree
>
> I've been kind-of suspicious of the idea of tokens attached
> to Symbols.  They're useful fo the simple protein and DNA
> alphabets, but otherwise rather problematic.
> ------ I can see a huge use for them in Simple Cross ProductAlphabets(ie
> codons), DNA and Protein Alphs.
>   As a result,
> I'd like to remove the tokens from the symbols themselves, and
> instead have `Stringifier' objects that are responsible for
> encapsulating code to convert Symbols and SymbolLists into
> textual representation -- in other words the exact opposite
> of SymbolParser objects.
>
> ----- I like this idea as of a StringifierObject, but for more complex
> symbol objects. If it would work just like the SymbolParser Objects, in fact
> it might make things simpler to just use the same object. How about this...
> 1.1 release
> add String getTokens() method to BasisSymbol interface
> >1.1 release
> keep the getTokens() and getToken() methods
> add a tag for DefaultStringifier/Parser in the xml doc
> when getTokens(), is called, it uses that stringifier object to parse itself
> this is kinda kludgy, but also allows biojava to be simplier for simple
> things
>
> We'd still want to leave the getName() method on symbols,
> since they're very useful for debugging. (note, I've
> fixed this so it now always gives a sensible answer for
> ambiguous and crss-product symbols).
>
> I was planning to leave this change 'til after the 1.1
> release, although I guess it could be bought forward.
> But I guess this could be reconsidered if you need more
> sophisticated stringification behaviour now.
>
> Happy hacking,
>
>   Thomas.
> --
> ``If I was going to carry a large axe on my back to a diplomatic
> function I think I'd want it glittery too.''
>            -- Terry Pratchett
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l