[EMBOSS] diffseq memory problem?
Peter Rice
pmr at ebi.ac.uk
Tue Feb 8 10:46:32 UTC 2011
Dear Caroline,
On 08/02/2011 10:01, Barretto, Caroline, LAUSANNE, BioInformatics wrote:
> Dear EMBOSS developers,
>
> I have been using diffseq to compare too strains of the same bacteria
> species using "10" as wordsize without any problem.
>
> However, when I try to reduce this number to "4", after several hours of
> calculation the server collapses, all RAM and SWAP are used.
>
> Is there any option to avoid that, or do you know if someone is working
> on that problem?
Depending on the input size, and the number of simple repeats, a low
word size could easily generate too many matches for large sequence lengths.
We would recommend reducing the word size more slowly (maybe 10, 8, 6).
As a guideline, finding more matches than there are non-overlapping
words in the sequence is unlikely to be useful and is a reasonable point
to stop reducing the word size.
Meanwhile, we will take a look at diffseq in case there is some way to
improve its performance or to warn an early stage if the word size
appears small for the input sequence lengths and may generate too many
matches.
Hope this helps
Peter Rice
EMBOSS Team
More information about the EMBOSS
mailing list