[Biojava-l] Anotatable Symbol

Matthew Pocock mrp@sanger.ac.uk
Wed, 01 Nov 2000 11:19:02 +0000


Hi Mike.

There are several ways to do this without breaking anything we have at the
moment. Firstly, you could add a method to ProteinTools

double getResidueMass(Symbol s) throws IllegalSymbolException

You could store the mass information in a format similar to
resources/org/biojava/bio/seq/TranslationTables.xml (which is loaded by
RNATools). The proplem with this is that you would have many
getResidueMassByBla methods. Alternatively, you could write a new interface
like this:

public interface SymbolProperty {
  FiniteAlphabet getAlphabet();
  double getValue(Symbol s) throws IllegalSymbolException;
}

You could then have ProteinTools provide several well-known versions - mass,
charge, size etc. and load the data from a SymbolProperty.xml resource. It
also leaves the door open to things like DNA physical properties.

Another way to do this is to add the data to AlphabetManager.xml directly.
You would have to modify the DTD so that the description element could have
<key type="java.lang.String">mass<value
type="java.lang.Double">90.3</value></key> style children, and then extend
the symbolForXML code to handle this. The description elements should
probably move to being <key type="java.lang.String"><value
type="java.lang.String">The description goes in here</value></key>

My money is on the interface option, as it lets you plug in new physical
properties without having to have access to AlphabetManager.xml, including
parameterising algorithms at run-time - TranslationTables ended up being
great for this. The down-side for heavily computational algorithms is that
you will have to perform some type of search within the implementations to
find the value associated with a symbol. The issue of how to optimaly
implement this search is nicely solved with the AlphabetIndex interface
(just in), so it may not be that bad in practice. I have a feeling that the
overhead of finding a particular key within an annotation bundle will be
higher than the cost of looking up a double based upon the amino-acid, as
hash-codes have to be calculated, and lots of functions and members are
fetched to traverse the hash table.

What do other people think?

Mike Jones wrote:

> I am starting to work on a package for biojava that can be used for MS
> experimental data. Initially for proteins. So I need a way to annotate
> amino acids with their atomic mass. I would appreciate the help of those
> who have done such things. Can I just modify the AlphabetManager.xml.
> Say add a new Alphabet
>
> I would rather not rewrite each symbol but if I were this is how it
> would look.
> <alphabet name="RESIDUE_MASS" parent="PROTEIN">
>     <symbol name="s">
>             <short>S</short>
>             <long>SER</long>
>             <mono-mass>87.03203</mono-mass>
>             <avg-mass>87.0782</avg-mass>
>     </symbol>
>
> ...
>
> To do this though I imagine I would have to modify
> AlphabetManager.symbolFromXML.
>
> Please let me know if I am missing something or if any body has any
> ideas.
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l