[Bioperl-l] Blast Output and frac_aligned_query
Aaron J. Mackey
amackey at pcbi.upenn.edu
Tue Jul 20 06:44:03 EDT 2004
On Jul 20, 2004, at 4:50 AM, James Wasmuth wrote:
> Thanks Aaron, time is a slight issue as I'm carrying out several
> million comparisions but I'll concede accuracy is the more important
> feature...
Right, another common fallacy with bl2seq: several million pairwise
comparisons always sounds like alot, until one realizes that a single
search of the "nr" database is of the same magnitude. Sure, BLAST will
finish this amount of work in less than 10 minutes, but do we really
mind waiting an hour or two to get better alignments? You're going to
spend far more time on the analysis, why not make it easier on yourself
in the long run (and not have to worry about niggling questions like
"Hmm, I wonder if BLAST actually aligned all of the homologous regions,
or only those disjoint, slowly-evolving fragments it could easily
find"; this is particularly relevant when using BLAST to align DNA to
either DNA or protein).
As an aside, this is exactly the kind of batch processing targeted by
various task distribution clients (e.g. "disperse"). With a modicum of
processing power (say 4-8 modern CPUs), we routinely batch process
millions of pairwise alignments with SSEARCH, PRSS, and/or LALIGN.
Additionally, for the common "all-vs-all" matrix of pairwise alignment
case, SSEARCH has the "-I" option, which evaluates only the
lower-triangle of the matrix (thus, providing the A vs. B, but not B
vs. A alignment; these are guaranteed to have identical alignments and
scores, but probably different E() values and bit scores; but you were
already using PRSS or PRFX to confirm pairwise significances, right?).
And to add just a bit more icing to the cake, SSEARCH runs efficiently
under both PVM and MPI parallel environments; so the 10-100 fold
"slow-down" associated with SW can be nicely ameliorated with 8 to 32
cluster nodes (unless your database is very big, more than 32 nodes
will typically not be any more efficient). For those with multi-CPU
machines, you can also build threaded SSEARCH for single workstation
use.
This public service message brought to you by the fine makers of:
FASTA, the original search algorithm
Add grains of salt to taste. And thanks, James, for being my scapegoat
of the day.
-Aaron
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania email: amackey at pcbi.upenn.edu
415 S. University Avenue office: 215-898-1205
Philadelphia, PA 19104-6017 fax: 215-746-6697
More information about the Bioperl-l
mailing list