[Bioperl-l] StandAloneBlast.t testing

Todd Richmond todd@andrew2.stanford.edu
Mon, 14 May 2001 09:10:22 -0700


I finally got around to installing standalone blast and ran into a few
issues. (For those that are interested, the ncbi toolkit now compiles
"out-of-the-box" on MacOS X).

I dumped the toolkit into /usr/local/ncbi with separate subdirectories for
bin,lib, and data. There's really no instructions in the toolkit for
installing everything once you are done, but that seemed logical. I added
the bin directory to the path and set up the .ncbirc configuration file
pointing to the data directory. Checked it with some blast searches of my
own and everything worked perfectly. Then I went to run StandAloneBlast.t

First, there was the issue of setting $BLASTDIR and $BLASTDATADIR.

StandAloneBlast.pm says:

# You will need to enable Blast to find the Blast program. This can be done
# in (at least) three ways:
#  1. Modify your $PATH variable to include your Blast directory as in (for
Linux):
#    export PATH=$PATH:/home/peter/blast   or
#  2. define an environmental variable blastDIR:
#    export BLASTDIR=/home/peter/blast   or
#  3. include a definition of an environmental variable BLASTDIR in every
script

# If local BLAST databases are not stored in the standard
# /data directory, the variable BLASTDATADIR will need to be set explicitly
     $DATADIR =  $ENV{'BLASTDATADIR'} ||
       Bio::Root::IO->catfile($BLASTDIR,'data');
}

#1 doesn't work for me, even though $PATH clearly includes blastall. Is that
just a MacOS X thing? I couldn't get the tests to run correctly until I set
$BLASTDIR and $BLASTDATADIR, even though a standard command-line blast
worked fine. (Same problem with clustalw - can't find the application even
if it's on the path).

For the BLAST databases, the standard /data directory is supposed to be the
one that you're required to specify in the .ncbirc file, right? Shouldn't
you be able to leave that blank, and have blastall look in it's own default
directory?

Second, what's the deal with the test databases? Do they come by default if
you download a binary distribution? They certainly didn't come with the ncbi
toolkit. Is it just assumed that everyone will have ecoli.nt and swissprot
installed? There's no instructions in StandAloneBlast.t that tells the user
where to get the test databases if they aren't installed, so I had to  hunt
around in ftp://ncbi.nlm.nih.gov/blast/ to find them. Isn't it a bit much to
ask people to download and format a ~45 MB database for one test? Is there
some reason that the whole database needs to be searched, or can we put
together a small preformatted database and test sequence in the data folder
for this kind of test?

Enough rambling - once I set all of the environment variables and downloaded
the databases, both the StandAloneBlast and Clustalw tests pass on MacOS X.
Now on to Tcoffee...

-- 
Todd Richmond                    http://cellwall.stanford.edu/todd
Carnegie Institution             email: todd@andrew2.stanford.edu
Department of Plant Biology      fax: 1-650-325-6857
260 Panama Street                phone: 1-650-325-1521 x431
Stanford, CA 94305