[EMBOSS] Memory problem with extractseq

Peter Rice pmr at ebi.ac.uk
Thu Mar 18 13:30:12 UTC 2010


On 18/03/10 09:11, michael watson (IAH-C) wrote:
> Hi
> 
> I'm using EMBOSS 6.1.0 on a fairly small Linux VM which has about 3Gb of RAM.
> 
> I find it strange that extractseq reports a memory problem:

Some further investigation suggests several improvements for the next
release:

The input was being buffered with the entire input buffer (2000 bytes)
saved per line. That is why it used so much memory. This can be reduced
to a more reasonable figure (and we can save space in some other string
copies).

When processing FASTA format (and various others), once the '>' line has
been found it cannot fail. It will read everything up to the next '>' or
continue to the end of the file. This means we can turn off buffering of
FASTA input (and other formats) once they no longer have any format
tests that can fail.

Both changes will have a similar effect to specifying the format on the
command line for large input files. That should work for any release.

Hope that helps,

Peter



More information about the EMBOSS mailing list