[Bioperl-l] What is the gap penalty function for BLOSUM62?

Empirical determination of effective gap penalties for sequence comparison
J.T. Reese  and  W.R. Pearson
Bioinformatics  Vol. 18 no. 11 2002 Pages 1500-1507


MOTIVATION: No general theory guides the selection of gap penalties for
local sequence alignment. We empirically determined the most effective gap
penalties for protein sequence similarity searches with substitution
matrices over a range of target evolutionary distances from 20 to 200
Point Accepted Mutations (PAMs). RESULTS: We embedded real and simulated
homologs of protein sequences into a database and searched the database to
determine the gap penalties that produced the best statistical
significance for the distant homologs. The most effective penalty for the
first residue in a gap (q+r) changes as a function of evolutionary
distance, while the gap extension penalty for additional residues (r) does
not. For these data, the optimal gap penalties for a given matrix scaled
in 1/3 bit units (e.g. BLOSUM50, PAM200) are q=25-0.1 * (target PAM
distance), r=5. Our results provide an empirical basis for selection of
gap penalties and demonstrate how optimal gap penalties behave as a
function of the target evolutionary distance of the substitution matrix.
These gap penalties can improve expectation values by at least one order
of magnitude when searching with short sequences, and improve the
alignment of proteins containing short sequences repeated in tandem.

On Thu, 17 Jul 2003, Yee Man Chan wrote:

> Hi,
> 	I got conflicting usage of gap penalty functions for BLOSUM62
> matrix:
> ssearch34: g(k) = 7+k
> Henikoff & Henikoff paper: g(k) = 8+4*k
> Gapped BLAST paper: g(k) = 10+k
> Ewan's pSW module: g(k) = 12+2*k
> where k is the number of gaps.
> 	Which one is the correct one? It seems to me all of them use the
> exactly the same blosum62 matrix.
> Thanks
> Yee Man
