[EMBOSS] Indexing databases and their updates/new releases

Nancy Yu nancyyu at imb.sinica.edu.tw
Wed Oct 8 10:13:35 UTC 2003


I have a bunch of questions about the indexing of the databases.  First
of all, what kind of computers are people using to run Emboss?  I am
running on a Athlon MP2000+ dual processor with 1GB RAM (on Linux Redhat
9.0).  Running dbiflat for EMBL est*.dat has taken forever (about 5 days
and still not done yet).  Are people using 64-bit systems, cluster
systems, or other high-end computing systems?  Is Emboss designed to run
on these technologies?

I'm still confused about the dbiflat indexing process.  I know it
produces 4 files, acnum.hit, acnum.trg, division.lkp, entrynam.idx.  As
I read somewhere in the mail archive, division.lkp stores the location
of the database directory.  Doesn't this means that if we move our *.dat
file to a different directory, we would have to re-index again?  Hence,
everytime we download a new database, a new release, or an update, we
will have to re-index everything?  Also, if dbiflat was interrupted half
way through indexing, is it possible to continue where it left off?
>From my experience, it seems like the whole process starts over again.

Just wondering, are the included index files for databases like embl
(eg. division.ndx and other *.ndx files) useful at all for Emboss, or
are they more for other programs?  Can I somehow use these index files,
ie. is there a fast way of indexing a database that I missed, or am I
too clueless to know what I'm talking about?

My main concern is that at the speed it takes to index a new release of
 large databases like EMBL or Genbank, it would be difficult for me to
try to keep my local databases up-to-date.

Thanx in advance for any comments and explanations :)

Best Regards,
Nancy Yu

More information about the EMBOSS mailing list