[Bioperl-l] E-value of a combined alignment?
Ian Korf
ik1 at sanger.ac.uk
Wed Sep 3 13:33:48 EDT 2003
> I believe that this is actually the behavior of NCBI's BLASTP. All of
> the
> HSP's in a hit get the same evalue, which is about what you would get
> if
> you summed the bit scores of the HSPs and then calculated a final
> evalue.
This is definitely not what BLASTP or any other BLAST does. If this was
the case, you could sum up the scores for highly insignificant HSPs
(e.g. those with an E-value of 1.0) and come up with a very good
E-value. The log(KMN) penalty for each HSP subtracts the background
expected alignment score [every search has a score with an E-value of
1.0, and this is log(KMN) in the limit of large sequences]. Combining
alignments is not so straightforward if you want the HSPs to be
consistent (e.g. the N-termini match and the C-termini match rather
than the N-terminus matching the C-terminus). In this case, one must
evaluate all HSPs to compare the overlaps. Since this is a quadratic
operation, it doesn't scale well to large sequences. Setting high
values of single-HSP cutoffs helps offset the cost as does gapped
alignment, which produces fewer HSPs. The cutoff value is hard-coded in
NCBI-BLAST but not WU-BLAST (parameters are S2 and gapS2).
> If "p" scores were really probabilities, we could combine them using
> the
> formulas for either dependent or independent events. Has anyone tried
> this?
There's loads of literature on this topic already. The papers are
mostly theoretical though, and do not really concern themselves with
the practicality of biological sequences. Finite sequence lengths pose
some problems. For example, the log(KMN) expected score is a little too
high. BLAST therefore uses some heuristics to bring this down. The code
in NCBI-BLAST that does this is a little frightening and I've no idea
what WU-BLAST does though it seems to take length into account in some
manner.
This stuff (and a whole lot more) is discussed in the O'Reilly BLAST
book (sorry for the shameless plug).
-Ian
>
> -Chris Dwan
> CCGB, University of Minnesota.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list