databases strategies

Robert Milius milius at
Wed Aug 9 20:02:54 UTC 2000


Please excuse a perhaps naive question. 

For those institutions that want download and maintain their own databases
(eg Swiss-Prot, PIR, GenBank, EMBL, etc) and want to keep the 
duplication to a mininum (whole databases), what kind of directory 
structure do you use (in unix)?

For example, if I want to give our users local access to NCBI's BLAST
and WU's BLAST, I'll need to run formatdb and pressdb/setdb to provide
different formats of the same databases. Both look at the BLASTDB 
environmental variable to locate the databases which means that
they all have to be in the same directory. 

For example, I can download the files from 
into /usr/local/db/blast, uncompress them and run formatdb and pressdb/setdb
on them, and set the BLASTDB env to point to it. Doing this I can get 
BLAST to run fine.  

Now I want to be able to use EMBOSS on the same databases using 
the dbiblast utility. This program creates 4 files: acnum.hit, acnum.trg,
entrynam.idx, and division.lkp. The problem is that if I run
dbiblast for each db in the directory, they overwrite the ones that
that were just created. 

I suppose I can create a separate directory for each db, and symlink 
the files into a common BLASTDB directory. Is there a more elegant
solution I'm missing? 

I notice that the emboss.default appears to have a great deal of 
flexibility. Is it possible to have the files created by
dbiblast in one folder while the data is another? I have looked at
the docs in
but must admit it isn't all that clear to me. I tried playing with
the "file:" entry, hoping to point it to different databases, but
haven't had much luck with it.

Any insight would be appreciated.

btw, thanks to everyone who are involved in producing this
wonderful package!!


Robert P. Milius, Ph.D.                            612-626-2771 (office)
Basic Sciences Computing Laboratory                612-625-4433 (fax)
University of Minnesota Supercomputing Institute   milius at
for Digital Simulation and Advanced Computation

More information about the EMBOSS mailing list