[Bioperl-l] What is the gap penalty function for BLOSUM62?
Yee Man Chan
ymc at paxil.stanford.edu
Fri Jul 18 10:23:50 EDT 2003
Thanks Aaron for the paper. I think I will use 7+k then.
Yee Man
On Thu, 17 Jul 2003, Aaron J Mackey wrote:
>
> See:
>
> http://bioinformatics.oupjournals.org/cgi/content/abstract/18/11/1500
>
> Empirical determination of effective gap penalties for sequence comparison
> J.T. Reese and W.R. Pearson
> Bioinformatics Vol. 18 no. 11 2002 Pages 1500-1507
>
> Abstract:
>
> MOTIVATION: No general theory guides the selection of gap penalties for
> local sequence alignment. We empirically determined the most effective gap
> penalties for protein sequence similarity searches with substitution
> matrices over a range of target evolutionary distances from 20 to 200
> Point Accepted Mutations (PAMs). RESULTS: We embedded real and simulated
> homologs of protein sequences into a database and searched the database to
> determine the gap penalties that produced the best statistical
> significance for the distant homologs. The most effective penalty for the
> first residue in a gap (q+r) changes as a function of evolutionary
> distance, while the gap extension penalty for additional residues (r) does
> not. For these data, the optimal gap penalties for a given matrix scaled
> in 1/3 bit units (e.g. BLOSUM50, PAM200) are q=25-0.1 * (target PAM
> distance), r=5. Our results provide an empirical basis for selection of
> gap penalties and demonstrate how optimal gap penalties behave as a
> function of the target evolutionary distance of the substitution matrix.
> These gap penalties can improve expectation values by at least one order
> of magnitude when searching with short sequences, and improve the
> alignment of proteins containing short sequences repeated in tandem.
>
>
> On Thu, 17 Jul 2003, Yee Man Chan wrote:
>
> >
> > Hi,
> >
> > I got conflicting usage of gap penalty functions for BLOSUM62
> > matrix:
> >
> > ssearch34: g(k) = 7+k
> > Henikoff & Henikoff paper: g(k) = 8+4*k
> > Gapped BLAST paper: g(k) = 10+k
> > Ewan's pSW module: g(k) = 12+2*k
> >
> > where k is the number of gaps.
> >
> > Which one is the correct one? It seems to me all of them use the
> > exactly the same blosum62 matrix.
> >
> > Thanks
> > Yee Man
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Aaron J Mackey
> Pearson Laboratory
> University of Virginia
> (434) 924-2821
> amackey at virginia.edu
>
>
More information about the Bioperl-l
mailing list