[EMBnet ADMIN] EMBOSS - Indexing breaks on large databases

David Martin david.martin at biotek.uio.no
Thu Feb 8 09:32:12 UTC 2001


On Wed, 7 Feb 2001, Len F. Zaifman wrote:

> I have installed emboss 1.9.1 on an O2000. It installed nicely once I
> gave up on installing it shared.

Which compiler were you using? I note that you have the MIPS compiler.
Have you tried using gcc which seems (on my o200 which shouldn't be so
different) to work just fine on EMBL.

..d

>
> The issue came up in indexing genbank files. Most divisions indexed fine
> with dbiflat. However, when I try to index
> est , or all of genbank , the indexing breaks due to sort running out of
> memory:
>
> explicitly:
>  I run
> dbiflat -idformat GB -directory /data/genbank -indexdirectory
> /tools/emboss1.9.1/data/indices/est -dbname GenBankEst -filenames
> gbest*.seq  -date 06/02/01 -sortoptions '-T
> /tmp_disk/scratch4/applicat/est -k1,1'
> &
> dbiflat -idformat GB -directory /data/genbank -indexdirectory
> /tools/emboss1.9.1/data/indices/genbank -dbname GenBank -filenames
> *.seq  -date 06/02/01 -sortoptions '-T
> /tmp_disk/scratch4/applicat/genbank -k1,1'
> & get
>
> 	UX:sort: ERROR: Out of memory before merge: Not enough space


>
>
> sort is run with -T /scratch4   -k1,1   , where scratch4 has a 10 GB
> quota
> I checked the environment and it is using the system sort (/bin/sort).
> There were no syslog errors.
>
> All other smaller divisions seemed to work.  I have a scheduled reboot
> where I am going to set the
> maximum resident set size to 1 GB (it is currently 1/2 GB). However, is
> there a more clever way of doing this (ie if I did this on my work
> station I would be limited to 1/8 GB or swap like crazy).
>
> Details:
>
> I configure using:
> 	./configure --prefix=/tools/emboss1.9.1  --disable-shared --with-x
> --with-pngdriver
>
> 	on an O2K running Irix 6.5.10 and the MipsPro 7.3.1.2 Compilers
>
> Any ideas??
>
>
>
> As a side note: when I tried indexing all of genbank I got almost 60000
> sequences generating the following warning notice:
>
>
>
>    This is a warning: Duplicate ID skipped: 'XXXXXXXX'
>
> Is this an indication that the initial data needs to be cleaned up
> first, or a non-issue?
>
> Thanks.
>
>
>
>

---------------------------------------------------------------------
*  Dr. David Martin                  Biotechnology Centre of Oslo   *
*  Node Manager                      Gaustadalleen 21               *
*  The Norwegian EMBNet Node         P.O. box 1125 Blindern         *
*  tel +47 22 84 05 35               N-0317 Oslo                    *
*  fax +47 22 84 05 01               Norway                         *
---------------------------------------------------------------------
I will be leaving the Norwegian EMBnet node on 23rd February.
All work related mail should be addressed to admin at embnet.uio.no where
my successor, Rune Groven will deal with it.
All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence
it will be automatically forwarded to me.
Spam should continue to be sent to /dev/null








More information about the emboss-dev mailing list