[Bioperl-l] E-value of a combined alignment?
Ian Korf
ik1 at sanger.ac.uk
Wed Sep 3 07:46:02 EDT 2003
There are several publications on combined statistical significance of
local alignment scores. The ones implemented in BLAST are not exactly
the same as the publications though. You can get a pretty decent
approximation by subtracting log(KMN) for each gap, but this isn't the
proper formula.
WU-BLAST is much better for combined statistics than NCBI-BLAST because
it shows the actual groups with the -links parameter and allows you to
limit the number of groups with the -topcomboN and -topcomboE
parameters. It also lets you fine-tune the groupings a bit with -olmax
and -olfmax. If the sequences aren't too diverged, you might be better
off keeping X low though.
-Ian
On Wednesday, September 3, 2003, at 12:06 AM, Yee Man Chan wrote:
>
> Hi folks,
>
> I am aligning mRNAs against human genome using ungapped tblastx. I
> got a bunch of HSPs with different e-values. I can observed that some
> of
> them should be in the same group because they are exons of a gene. But
> then what is the e-value of all these HSPs combined?
>
> I know the formulas of e-value and bit score for BLOSUM62:
>
> Let S' be bit score, S be score, e be e-value, m be the length of HSP,
> n be length of database.
>
> S' = (0.318 * S - ln(0.135)) / ln(2)
>
> e = m * n / (2^(S'))
>
> I am guessing the formula for the e-value of
> non-overlapping combined e-value to be:
>
> S'' = (0.318 * sum_of_S - ln(0.135)) / ln(2)
>
> e' = sum_of_m * n / (2^(S''))
>
> Is this correct? Or do you know the right way to calculate it?
>
> Thanks in advance.
> Yee Man
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list