[EMBOSS] Indexing databases and their updates/new releases
nancyyu at imb.sinica.edu.tw
Wed Oct 8 10:13:35 UTC 2003
I have a bunch of questions about the indexing of the databases. First
of all, what kind of computers are people using to run Emboss? I am
running on a Athlon MP2000+ dual processor with 1GB RAM (on Linux Redhat
9.0). Running dbiflat for EMBL est*.dat has taken forever (about 5 days
and still not done yet). Are people using 64-bit systems, cluster
systems, or other high-end computing systems? Is Emboss designed to run
on these technologies?
I'm still confused about the dbiflat indexing process. I know it
produces 4 files, acnum.hit, acnum.trg, division.lkp, entrynam.idx. As
I read somewhere in the mail archive, division.lkp stores the location
of the database directory. Doesn't this means that if we move our *.dat
file to a different directory, we would have to re-index again? Hence,
everytime we download a new database, a new release, or an update, we
will have to re-index everything? Also, if dbiflat was interrupted half
way through indexing, is it possible to continue where it left off?
>From my experience, it seems like the whole process starts over again.
Just wondering, are the included index files for databases like embl
(eg. division.ndx and other *.ndx files) useful at all for Emboss, or
are they more for other programs? Can I somehow use these index files,
ie. is there a fast way of indexing a database that I missed, or am I
too clueless to know what I'm talking about?
My main concern is that at the speed it takes to index a new release of
large databases like EMBL or Genbank, it would be difficult for me to
try to keep my local databases up-to-date.
Thanx in advance for any comments and explanations :)
More information about the EMBOSS