Emboss and EMBL size problem

simon andrews (BI) simon.andrews at bbsrc.ac.uk
Fri Jul 5 13:51:57 UTC 2002


> -----Original Message-----
> From: Tony Pemberton [mailto:pemberaj at pugh.bip.bham.ac.uk]
> Subject: Emboss and EMBL in GCG format 
> 
> Has anybody come across this problem:
> 
> I am trying to index the latest release of EMBL 71.0 
> In both version 2.2.0 and 2.4.1, I get the following
> 
> Error accessing temp file.
> 
> EMBOSS An error in dbigcg.c at line 565:
> Failed to open embl.acsrt2 for reading
> (2.2.0)
>
> [I] therefore suspect it is a problem related to the ever
> increasing size of EMBL. Can anyone
> confirm?

I don't have a solution to this, but this is something which has bitten us this week so I thought I'd follow up with some observations from our group.

We had problems trying to index the hum01.dat file from the latest EMBL release using dbiflat.  We get a similar error to the above, with the program saying that it is unable to open the file. The file in this case comes in at about 2.4Gb.

[As a side note, I thought that EMBL were supposed to ensure that none of the files in their releases got over 2Gb, so how did this one get through??]

In our case we wrote a small Perl script to split the offending file into two smaller files, and the processing proceeded OK.

The thing we haven't managed to establish is how we can get EMBOSS to cope with large files.  We're running Linux with glibc2.2.4-24, which has support for large files (we can write short C scripts which will open and read 2Gb+ files), so why does EMBOSS not work with them?

There are some notes on the web from an EMBOSS project meeting which suggest that EMBOSS should contain the appropriate code to work with >2Gb files, but they only talk about 64bit OSs, rather than 32bit systems with patched glibcs.

	http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Meetings/2000-08-02.html

I also note that the configure script for EMBOSS identifies the command line options:

	--enable-large
	--enable-64

..as well as many other text fragments which look like they might have something to do with this topic, but I've not been able to find any documentation about them, nor can I see anything in our configure log where the system is being checked for large file support.

Since modern versions of 32-bit operating systems have glibc libraries which can cope with large files it would be extremely useful to have EMBOSS be able to use this functionality.

Is anyone able to shed any light on the changes which would need to be made either to our configuration, or to the EMBOSS source code to allow large files to be accessed under systems such as ours.

Many thanks

Simon.

--
Simon Andrews PhD
Bioinformatics Dept
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0)1223 496463 



More information about the EMBOSS mailing list