[EMBOSS] formatting databases

Peter Rice pmr at ebi.ac.uk
Tue Mar 31 11:44:10 UTC 2009


Brian Moldover wrote:

> Just finished setting up the EMBOSS software. Now I'm trying to install
> databases locally and finding it to be a bit more work. I downloaded part of
> EMBL (rel_con_*) and trying to index with dbiflat. I get an error message
> "rel_con_mam_01_r99 too large for DBI indexing". 

Sadly this does happen for occasional EMBL or GenBank releasees.
The files are supposed to stay within a 2Gb size limit.

There is a scripts/emblsplit.pl file included in the EMBOSS distribution
that will split a file at 1,900,000,000 bytes. You need to clean up the
original file after running it.

The 2Gb limit could be increased to 4Gb at the risk of breaking some
third party applications (Staden, Sanger efetch).

We could change for the next release to issue a warning if the file size
is between 2Gb and 4Gb as the index file has enough space to store the 4Gb
file pointers. Are other users interested in allowing this?

> Still not entirely sure what I'm doing regarding getting databases up and
> running which I why I started with a subset of EMBL. Suggestions?

If you do not need to process the whole database you can also use remote
access to fetch single entries on demand. This saves you doing any
database indexing.

You can also use the dbx indexing programs which support larger files.

regards,

Peter Rice



More information about the EMBOSS mailing list