[EMBOSS] Indexing databases and their updates/new releases
Ran Rubinstein
ranrub at md.huji.ac.il
Wed Oct 8 11:22:48 UTC 2003
I am currently tacking similar problems. The solution I've worked out for
now is to use SRS (www.lionbio.co.uk) as a database - it's free for academic
institutions, has very strong indexing tools that can use parallel
processing, and Emboss can be configured to work through it. It also has a
very professional web interface that can launch EMBOSS tools. The downside
is that it's a bit complicated to install and administer. I'll be glad to
hear alternative suggestions, tho.
The server I'm running SRS and EMBOSS on is a 4-processor Origin200, 270Mhz,
with 1GB RAM. Indexing (using SRS) the latest GenBank release, with NO
EST's, GSS, HTG, took approximately 24 hours, running in parallel on 4
processors.
--
Ran Rubinstein
Dept. of Molecular Biology
Faculty of Medicine, Hebrew University, Ein Karem
Tel +972-2-6757906 Fax +972-2-6758992
-----Original Message-----
From: owner-emboss at hgmp.mrc.ac.uk [mailto:owner-emboss at hgmp.mrc.ac.uk] On
Behalf Of Nancy Yu
Sent: Wednesday, October 08, 2003 12:14 PM
To: emboss at embnet.org
Subject: [EMBOSS] Indexing databases and their updates/new releases
Hello,
I have a bunch of questions about the indexing of the databases. First
of all, what kind of computers are people using to run Emboss? I am
running on a Athlon MP2000+ dual processor with 1GB RAM (on Linux Redhat
9.0). Running dbiflat for EMBL est*.dat has taken forever (about 5 days
and still not done yet). Are people using 64-bit systems, cluster
systems, or other high-end computing systems? Is Emboss designed to run
on these technologies?
I'm still confused about the dbiflat indexing process. I know it
produces 4 files, acnum.hit, acnum.trg, division.lkp, entrynam.idx. As
I read somewhere in the mail archive, division.lkp stores the location
of the database directory. Doesn't this means that if we move our *.dat
file to a different directory, we would have to re-index again? Hence,
everytime we download a new database, a new release, or an update, we
will have to re-index everything? Also, if dbiflat was interrupted half
way through indexing, is it possible to continue where it left off?
>From my experience, it seems like the whole process starts over again.
Just wondering, are the included index files for databases like embl
(eg. division.ndx and other *.ndx files) useful at all for Emboss, or
are they more for other programs? Can I somehow use these index files,
ie. is there a fast way of indexing a database that I missed, or am I
too clueless to know what I'm talking about?
My main concern is that at the speed it takes to index a new release of
large databases like EMBL or Genbank, it would be difficult for me to
try to keep my local databases up-to-date.
Thanx in advance for any comments and explanations :)
Best Regards,
Nancy Yu
More information about the EMBOSS
mailing list