[Bioperl-l] e- and h-values in psi-blast

Thu, 16 May 2002 22:32:10 +0530 (IST)

hi,

I have a few questions on psi-blast, not bioperl. I presume that people on
this list would have a lot of experience with psi-blast, which is why I am
posting here. I have read the psi-blast tutorial, but I still have a few
questions. I want to know the difference
between e-value and h-value specified for a psi-blast run. 

As I understand from the psi-blast tutorial: 

1. the e-value specifies the
hits which are displayed in the output after the output...all the
hits displayed have e-values higher than the specified e-value.

2. The h-value, on the other hand, specifies which hits are used to build
the profile for the next iteration of psi-blast. So, if the h-value
(eg. 1) is higher than the e-value (0.01), even though hits which have
e-values greater than 0.01 aren't displayed in the output, they are
included for building the profile as long as the e-values are less than or
equal to 1. 

Imagine a hypothetical situation of 2 psi-blast runs against the same
database:

1. e = 0.01, h = 0.01, j = 15. 
2. e = 0.001, h = 0.01, j = 15. 

now, if I wrote a program to extract only those hits which had e-values
greater than _0.0001_ (please note this is lower than than thresholds used
in either run) from round 15 of each output, would the numbers I get
be equal, or would they be different? I think they should be equal.

This is how I understand these parameters, and I
would really appreciate if somebody could tell me if I have got it right.

One last question - what are the acceptable (i.e. published) e- and
h-values for searching against medium-sized genome databases (~8000
sequences)?

Thank you in advance for any help, and also for allowing a post not
directly related to bioperl.

Sandeep

-- 
use Perl;