[EMBOSS] Memory problem with extractseq

Peter Rice pmr at ebi.ac.uk
Thu Mar 18 12:39:28 UTC 2010


On 18/03/10 09:11, michael watson (IAH-C) wrote:
> Hi
> 
> I'm using EMBOSS 6.1.0 on a fairly small Linux VM which has about 3Gb of RAM.
> 
> I find it strange that extractseq reports a memory problem:
> 
> -bash-3.00# /usr/local/EMBOSS-6.1.0/bin/extractseq  -sequence chr1.fasta -outseq chr1_.1.fasta -regions '34415690-34415711'
> Extract regions from a sequence
> Uncaught exception:  Allocation failed, insufficient memory available, raised at ajstr.c:2406
> 
> Whereas if I write a Bioperl script using SeqIO and the trunk() function, it works perfectly.
> 
> I'd have thought EMBOSS would be more streamlined and memory efficient than Bioperl?

It appears to be in the buffering of input to detect the format.

While we try to improve the performance, you can simply specify the format:

-sformat fasta

to turn off the file input buffering.

Reading an unknown format requires a lot of input to be buffered, in
case a GCG ".." checksum line appears.

Hope that helps

Peter




More information about the EMBOSS mailing list