[EMBOSS] Database too lare for dbifasta
George Magklaras
georgios at biotek.uio.no
Wed Jul 18 07:23:28 UTC 2007
This comes from the nucleus library embdbi.c file.
if(ajFileLength(name) > (ajlong) INT_MAX)
ajDie("File '%S' too large for DBI indexing", name);
INT_MAX is normally something like # define INT_MAX 2147483647
, as defined by your /usr/include/limits.h file.
In plain English, this means that you will have to split your flat input
data files so they are smaller than 2 Gigs. For EMBL database raw files
, the EMBOSS distribution includes now the 'emblsplit.pl' file , under
the scripts/ subdirectory. This script splits an #EMBL file (.dat)# your
pass it into < 2 Gig chunks. This won't work for fasta files, it works
only for EMBL format so you will have to split your 18 Gigs fasta input
by other means.
In previous versions of EMBOSS (4.0.0), the DBI formatting appear to
work, but the emboss index was not right, if your flat input file
exceeded 2 Gigs. At least, I had problems with indexing EMBL format
files larger than 2 Gigs and the dbiflat program.
Question to the developers:
Why INT_MAX (signed)? Why not unsigned UINT_MAX (to raise it a bit) or
another raised limit? It is a bit of an overhead to have to go through
the file split stage.
Best Regards,
GM
Ravi Vijaya Satya wrote:
> Hello,
>
> I am trying to index a large fasta file using dbifasta. He file is 18+GB in
> size. Indexing with dbifasta was working fine with EMBOSS-4.0.0. However,
> with 4.1.0 or 5.0.0, it complains that the file is 'too large for DBI
> indexing. Any suggestions other than switching back to 4.0.0?
>
> Thanks,
> Ravi
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
--
--
George Magklaras
Senior Computer Systems Engineer/UNIX Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://www.biotek.uio.no/
EMBnet Norway: http://www.no.embnet.org/
More information about the EMBOSS
mailing list