[Bioperl-l] What is the gap penalty function for BLOSUM62?

Yee Man Chan ymc at paxil.stanford.edu
Fri Jul 18 10:23:50 EDT 2003


Thanks Aaron for the paper. I think I will use 7+k then.

Yee Man

On Thu, 17 Jul 2003, Aaron J Mackey wrote:

> 
> See:
> 
> http://bioinformatics.oupjournals.org/cgi/content/abstract/18/11/1500
> 
> Empirical determination of effective gap penalties for sequence comparison
> J.T. Reese  and  W.R. Pearson
> Bioinformatics  Vol. 18 no. 11 2002 Pages 1500-1507
> 
> Abstract:
> 
> MOTIVATION: No general theory guides the selection of gap penalties for
> local sequence alignment. We empirically determined the most effective gap
> penalties for protein sequence similarity searches with substitution
> matrices over a range of target evolutionary distances from 20 to 200
> Point Accepted Mutations (PAMs). RESULTS: We embedded real and simulated
> homologs of protein sequences into a database and searched the database to
> determine the gap penalties that produced the best statistical
> significance for the distant homologs. The most effective penalty for the
> first residue in a gap (q+r) changes as a function of evolutionary
> distance, while the gap extension penalty for additional residues (r) does
> not. For these data, the optimal gap penalties for a given matrix scaled
> in 1/3 bit units (e.g. BLOSUM50, PAM200) are q=25-0.1 * (target PAM
> distance), r=5. Our results provide an empirical basis for selection of
> gap penalties and demonstrate how optimal gap penalties behave as a
> function of the target evolutionary distance of the substitution matrix.
> These gap penalties can improve expectation values by at least one order
> of magnitude when searching with short sequences, and improve the
> alignment of proteins containing short sequences repeated in tandem.
> 
> 
> On Thu, 17 Jul 2003, Yee Man Chan wrote:
> 
> >
> > Hi,
> >
> > 	I got conflicting usage of gap penalty functions for BLOSUM62
> > matrix:
> >
> > ssearch34: g(k) = 7+k
> > Henikoff & Henikoff paper: g(k) = 8+4*k
> > Gapped BLAST paper: g(k) = 10+k
> > Ewan's pSW module: g(k) = 12+2*k
> >
> > where k is the number of gaps.
> >
> > 	Which one is the correct one? It seems to me all of them use the
> > exactly the same blosum62 matrix.
> >
> > Thanks
> > Yee Man
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> -- 
>  Aaron J Mackey
>  Pearson Laboratory
>  University of Virginia
>  (434) 924-2821
>  amackey at virginia.edu
> 
> 



More information about the Bioperl-l mailing list