<div dir="ltr"><div dir="ltr">Greetings EMBOSS users!<div><br></div><div><ul><li>I am using shuffleseq on entire genomic DNA multifasta input files (EMBOSS ver 6.6.0).<br></li><li>For just one genome, that is relatively larger (~ 2GB) with several pseudomolecules in the 150-250Mb size range, I am splitting into individual sequences and running them as an arrau job.<br></li><li>All runs on UNIX based compute cluster using  SLURM queue controller.<br></li><li>My syntax is simply: shuffleseq srun shuffleseq -sformat pearson $IN $OUT<br></li><li>For the most part, all is well.<br></li><li>With that as context, I have a few questions about the use of shuffleseq:<br></li></ul></div><div><br></div><div><b><font color="#0000ff">Q1.</font></b> What is the calculation for RAM required, based on input file size? Is there an apprximate formula? Or have users figured it out empirically?</div><div><br></div><div><b><font color="#0000ff">Q2.</font></b> When I performed some downstream analyses of shuffled genomes from 5 independent runs of shuffleseq, 4/5 gave me no DNA sequence matches - suggesting shuffling worked well, but in 1/5 this was not at all the case. So I wonder whether the randomization step during shuffling is quirky in any way!? </div><div>I came across this <a href="http://eyegene.ophthy.med.umich.edu/shuffle/">link</a> - describing possible issues with lack of true randomization in an old EMBOSS release. I makes me wonder if these sort of issues still play any role in version 6.6.0 as well? </div><div>Or could there be other explanation(s) for why 4 are good shuffles but 1 is not at all. The scripts across the repetitions are easy to copy and modify suitably. Nevertheless, I've checked and re-checked syntax, no errors there. <br></div><div><br></div><div>Thanks, in advance, for advice and pointers from forum members. </div><div>And, in advance, best wishes for  a happy and productive 2019.</div><div><br></div><div>Cheers!</div><div>Anand</div></div></div>