[Bioperl-l] Molecular weight calculations

Peter Schattner schattner@alum.mit.edu
Sat, 13 Jan 2001 13:41:42 -0800


I've recently been revisiting the dna & protein molecular wieght
calculations in SeqStats.pm and realize I have a few related questions I
would like to pose to the more bio-chemically oriented folks on the list.

In nucleic acid weight calculations:

1.  Should SeqStats use the charged or the neutral molecular weight of
the sugar-phosphate backbone? Given that these groups are charged at
physiological pH it seems reasonable to me - and the one biochemist with
whom I spoke - to use the charged values.  However, at least one
commercial package (VectorNTI) uses neutral weights so I am unsure. (The
difference is ~0.5% - 1% ).

2. For the initial (5') and final (3') sugar phosphate, should SeqStats
add an extra OH and an extra H respectively?  Again adding the weight of
the additional water seems readonable to me but is not the way the
weight calculation is sometimes performed. (The diference here is 18
which is negligible except when computing molecular weights of very
short oligos.)

In protein weight calculations:

3. Should SeqStats use the charged or the neutral molecular weights of
the acidic and basic amino acid residues (eg aspartate, glutamate,
histidine, arginine, lysine) in its computations? Given that these amino
acids are charged at physiological pH it seems reasonable to use charged
values.  However, again VectorNTI uses neutral weights so I am unsure. 
(The difference is ~0.5% - 1%  times the fraction of amino acids in the
protein which are acidic or basic).

 Although the difference in calculated weights is small, my
understanding is that with mass spectroscopy becoming increasingly
important for protein and nucleic acid analysis, having more precise
molecular weights might be useful (but if that's not really true, I'd
like to know that too.)  

It's easy enough to implement the calculation in any of these ways.Just
want to do it in the way that seems most useful.

Thanks for the help.

Peter

(The only downside of all this is that my revisiting of these
caclulations was triggered by Keith James discovering a bug in the
molecular weight calculations in the current (0.6)  version of
SeqStats.pm which causes it to return inaccurate values :--(. 
Everything is fixed for the - hopefully soon - 0.7 release, but in the
meantime the molecular weight routines of SeqStats should be avoided. 
The other methods of SeqStats.pm are fine.)