[EMBOSS] Database too lare for dbifasta

George Magklaras georgios at biotek.uio.no
Wed Jul 18 07:23:28 UTC 2007


This comes from the nucleus library embdbi.c file.

if(ajFileLength(name) > (ajlong) INT_MAX)
           ajDie("File '%S' too large for DBI indexing", name);


INT_MAX is normally something like #  define INT_MAX       2147483647
, as defined by your /usr/include/limits.h file.

In plain English, this means that you will have to split your flat input 
data files so they are smaller than 2 Gigs. For EMBL database raw files 
, the EMBOSS distribution includes now the 'emblsplit.pl' file , under 
the scripts/ subdirectory. This script splits an #EMBL file (.dat)# your 
pass it into < 2 Gig chunks. This won't work for fasta files, it works 
only for EMBL format so you will have to split your 18 Gigs fasta input 
by other means.

In previous versions of EMBOSS (4.0.0), the DBI formatting appear to 
work, but the emboss index was not right, if your flat input file 
exceeded 2 Gigs. At least, I had problems with indexing EMBL format 
files larger than 2 Gigs and the dbiflat program.

Question to the developers:

Why INT_MAX (signed)? Why not unsigned UINT_MAX (to raise it a bit) or 
another raised limit? It is a bit of an overhead to have to go through 
the file split stage.

Best Regards,
GM

Ravi Vijaya Satya wrote:
> Hello,
> 
> I am trying to index a large fasta file using dbifasta. He file is 18+GB in
> size. Indexing with dbifasta was working fine with EMBOSS-4.0.0. However,
> with 4.1.0 or 5.0.0, it complains that the file is 'too large for DBI
> indexing. Any suggestions other than switching back to 4.0.0? 
> 
> Thanks,
> Ravi
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 

-- 
--
George Magklaras

Senior Computer Systems Engineer/UNIX Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://www.biotek.uio.no/

EMBnet Norway:	http://www.no.embnet.org/



More information about the EMBOSS mailing list