[Bioperl-l] what constitutes a good match?

Jeffrey Chang jchang@SMI.Stanford.EDU
Mon, 9 Apr 2001 08:57:17 -0700 (PDT)

Hi Tania,

When evaluating the significance of your e-values, you also need to keep
in mind:

1.  The number of blast queries you're running.  The more you run, the
more likely you are to get a false positive.  For example, if you run 1000
queries and choose an expectation of 1E-3, it's probable that you'll get a
false positive.

2.  How willing you are to accept false positives.  Depending on your
application, you may be willing to accept a little noise, in which case
you can crank the significance down some.  Otherwise, make it more
strict to ensure that you keep out the false positives, risking that you
may lose some good hits too.

A good source of sequence searching information is Steven Brenner's
Sequence and Structure Searching Site:

I don't know if this answers your question...  :)


On Mon, 9 Apr 2001, Tania Oh wrote:

> Hi All,
> I do apologise first if the question I ask is considered off topic.
> I have some contig sequences which I want to match against the ensembl
> protein database to see if I have any genes on my sequences. The question
> is, when I do a blast against the ensembl.pep DB, what would consitute a
> good match? If one was to only  use the pscore  (eg. 1e-60 for a
> relatively good match) to sieve out potential genes on contigs, this would
> cause the problem of missing out on fragments of genes with very high
> pscores (eg. 0.5-3).
> How have pple gotten around this problem? Any ideas on what constitutes a
> good match?
> thanks in advance,
> tania
