EMBOSS - Indexing breaks on large databases

Len F. Zaifman leonardz at bioinfo.sickkids.on.ca
Wed Feb 7 21:32:35 UTC 2001

I have installed emboss 1.9.1 on an O2000. It installed nicely once I
gave up on installing it shared.

The issue came up in indexing genbank files. Most divisions indexed fine
with dbiflat. However, when I try to index 
est , or all of genbank , the indexing breaks due to sort running out of

 I run 
dbiflat -idformat GB -directory /data/genbank -indexdirectory
/tools/emboss1.9.1/data/indices/est -dbname GenBankEst -filenames
gbest*.seq  -date 06/02/01 -sortoptions '-T
/tmp_disk/scratch4/applicat/est -k1,1'
dbiflat -idformat GB -directory /data/genbank -indexdirectory
/tools/emboss1.9.1/data/indices/genbank -dbname GenBank -filenames
*.seq  -date 06/02/01 -sortoptions '-T
/tmp_disk/scratch4/applicat/genbank -k1,1'
& get

	UX:sort: ERROR: Out of memory before merge: Not enough space

sort is run with -T /scratch4   -k1,1   , where scratch4 has a 10 GB
I checked the environment and it is using the system sort (/bin/sort).
There were no syslog errors.

All other smaller divisions seemed to work.  I have a scheduled reboot
where I am going to set the 
maximum resident set size to 1 GB (it is currently 1/2 GB). However, is
there a more clever way of doing this (ie if I did this on my work
station I would be limited to 1/8 GB or swap like crazy).


I configure using:
	./configure --prefix=/tools/emboss1.9.1  --disable-shared --with-x

	on an O2K running Irix 6.5.10 and the MipsPro Compilers

Any ideas??

As a side note: when I tried indexing all of genbank I got almost 60000
sequences generating the following warning notice:

   This is a warning: Duplicate ID skipped: 'XXXXXXXX'

Is this an indication that the initial data needs to be cleaned up
first, or a non-issue?


More information about the emboss-dev mailing list