[Biojava-l] Expasy pI calculation algorythm

George Waldon gwaldon at geneinfinity.org
Sat Apr 2 00:11:02 UTC 2011


Hello,

Sorry if this comes a bit late; we had to solve some email issues -  
Thanks again to Andreas for doing it.

This is part of the email exchange I had with Christine Hoogland and  
Gregoire Rossier a few years ago regarding the algorithm used by  
"Compute pI/Mw" on the Expazy server. The code which was given to me  
is included at the end of this email; I used it to update bj1.

Good luck to all GSoC candidates,

George


On Tue, May 22, 2007 at 9:26 AM, Christine Hoogland via RT  
<tools at expasy.org> wrote:

     Dear George,

     Please find enclosed the algorithm we are using on ExPASy.

     I hope this helps.

     Best regards
     Christine

     >
     > The pK values used for "Compute pI/Mw" can be found in
     >
     > # Bjellqvist, B.,Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F.,
     > Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing
     > positions of polypeptides in immobilized pH gradients can be predicted
     > from their amino acid sequences. Electrophoresis 1993, 14, 1023-1031.
     >
     > MEDLINE: 8125050
     >
     > # Bjellqvist, B., Basse, B., Olsen, E. and Celis, J.E. Reference
     > points
     > for comparisons of two-dimensional maps of proteins from different
     > human
     > cell types defined in a pH scale where isoelectric points correlate
     > with
     > polypeptide compositions. Electrophoresis 1994, 15, 529-539.
     >
     > MEDLINE: 8055880
     >
     > The pK were defined by examining polypeptide migration between pH 4.5
     > to
     > 7.3 in an immobilised pH gradient gel environment with 9.2M and 9.8M
     > urea at 15ºC or 25ºC. Prediction of protein pI for highly basic
     > proteins
     > is yet to be studied and it is possible that current Compute pI/Mw
     > predictions may not be adequate for this purpose.
     >
     > I hope this helps.
     >
     >
     > Best regards
     > Gregoire Rossier
     >
     >

     --------------------------------------------------------
     Christine Hoogland
     Swiss Institute of Bioinformatics
     CMU - 1, rue Michel Servet      Tel. (+41 22) 379 58 28
     CH - 1211 Geneva 4 Switzerland  Fax  (+41 22) 379 58 58
     Christine.Hoogland at isb-sib.ch   http://www.expasy.org/
     --------------------------------------------------------

     //  VERSION      :   1.6
     //  DATE         :   1/25/95
     //  Copyright 1993 by Swiss Institute of Bioinformatics. All  
rights reserved.

     //
     // Table of pk values :
     //  Note: the current algorithm does not use the last two columns. Each
     //  row corresponds to an amino acid starting with Ala. J, O and U are
     //  inexistant, but here only in order to have the complete alphabet.
     //
     //     Ct    Nt   Sm     Sc     Sn
     //

     static double cPk[26][5] = {
     3.55, 7.59, 0.   , 0.   , 0.    , // A
     3.55, 7.50, 0.   , 0.   , 0.    , // B
     3.55, 7.50, 9.00 , 9.00 , 9.00  , // C
     4.55, 7.50, 4.05 , 4.05 , 4.05  , // D
     4.75, 7.70, 4.45 , 4.45 , 4.45  , // E
     3.55, 7.50, 0.   , 0.   , 0.    , // F
     3.55, 7.50, 0.   , 0.   , 0.    , // G
     3.55, 7.50, 5.98 , 5.98 , 5.98  , // H
     3.55, 7.50, 0.   , 0.   , 0.    , // I
     0.00, 0.00, 0.   , 0.   , 0.    , // J
     3.55, 7.50, 10.00, 10.00, 10.00 , // K
     3.55, 7.50, 0.   , 0.   , 0.    , // L
     3.55, 7.00, 0.   , 0.   , 0.    , // M
     3.55, 7.50, 0.   , 0.   , 0.    , // N
     0.00, 0.00, 0.   , 0.   , 0.    , // O
     3.55, 8.36, 0.   , 0.   , 0.    , // P
     3.55, 7.50, 0.   , 0.   , 0.    , // Q
     3.55, 7.50, 12.0 , 12.0 , 12.0  , // R
     3.55, 6.93, 0.   , 0.   , 0.    , // S
     3.55, 6.82, 0.   , 0.   , 0.    , // T
     0.00, 0.00, 0.   , 0.   , 0.    , // U
     3.55, 7.44, 0.   , 0.   , 0.    , // V
     3.55, 7.50, 0.   , 0.   , 0.    , // W
     3.55, 7.50, 0.   , 0.   , 0.    , // X
     3.55, 7.50, 10.00, 10.00, 10.00 , // Y
     3.55, 7.50, 0.   , 0.   , 0.    }; // Z

     #define PH_MIN 0 /* minimum pH value */
     #define PH_MAX 14 /* maximum pH value */
     #define MAXLOOP 2000 /* maximum number of iterations */
     #define EPSI 0.0001 /* desired precision */

       //
       // Compute the amino-acid composition.
       //
       for (i = 0; i < sequenceLength; i++)
         comp[sequence[i] - 'A']++;

       //
       // Look up N-terminal and C-terminal residue.
       //
       nTermResidue = sequence[0] - 'A';
       cTermResidue = sequence[sequenceLength - 1] - 'A';

       phMin = PH_MIN;
       phMax = PH_MAX;

       for (i = 0, charge = 1.0; i < MAXLOOP && (phMax - phMin) > EPSI; i++)
         {
           phMid = phMin + (phMax - phMin) / 2;

           cter = exp10(-cPk[cTermResidue][0]) /
      (exp10(-cPk[cTermResidue][0]) + exp10(-phMid));
           nter = exp10(-phMid) /
      (exp10(-cPk[nTermResidue][1]) + exp10(-phMid));

           carg = comp[R] * exp10(-phMid) /
      (exp10(-cPk[R][2]) + exp10(-phMid));
           chis = comp[H] * exp10(-phMid) /
      (exp10(-cPk[H][2]) + exp10(-phMid));
           clys = comp[K] * exp10(-phMid) /
      (exp10(-cPk[K][2]) + exp10(-phMid));

           casp = comp[D] * exp10(-cPk[D][2]) /
      (exp10(-cPk[D][2]) + exp10(-phMid));
           cglu = comp[E] * exp10(-cPk[E][2]) /
      (exp10(-cPk[E][2]) + exp10(-phMid));

           ccys = comp[C] * exp10(-cPk[C][2]) /
      (exp10(-cPk[C][2]) + exp10(-phMid));
           ctyr = comp[Y] * exp10(-cPk[Y][2]) /
      (exp10(-cPk[Y][2]) + exp10(-phMid));

           charge = carg + clys + chis + nter -
      (casp + cglu + ctyr + ccys + cter);

           if (charge > 0.0)
              phMin = phMid;
           else
              phMax = phMid;
         }
       }





More information about the Biojava-l mailing list