<div dir="ltr">Thanks for all the help much appreciated!<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 9 September 2014 15:04, Peter Cock <span dir="ltr">&lt;<a href="mailto:p.j.a.cock@googlemail.com" target="_blank">p.j.a.cock@googlemail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Sep 9, 2014 at 1:55 PM, Jurgens de Bruin &lt;<a href="mailto:debruinjj@gmail.com">debruinjj@gmail.com</a>&gt; wrote:<br>

&gt; Hi,<br>

&gt;<br>

</span><span class="">&gt; So the id I am matching to are in a set .<br>

<br>

</span>Good :)<br>

<span class=""><br>

&gt; if <a href="http://seq.id" target="_blank">seq.id</a> in lset_id:<br>

&gt;    list_seq.append(seq)<br>

<br>

</span>This looks like you are building a list of SeqRecord object in memory.<br>

If you are looking for a large number of entries in the FASTA file, this<br>

will consume a lot of RAM (and if you run out or RAM will suddenly<br>

slow down as swap space is used instead).<br>

<br>

I would use a generator approach to write out the records you want<br>

immediately, see the &quot;Filtering a sequence file&quot; example in the<br>

Cookbook chapter of the Biopython Tutorial:<br>

<br>

<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html" target="_blank">http://biopython.org/DIST/docs/tutorial/Tutorial.html</a><br>

<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.pdf" target="_blank">http://biopython.org/DIST/docs/tutorial/Tutorial.pdf</a><br>

<br>

In your case, replace &quot;sff&quot; with &quot;fasta&quot; and adjust how the set of<br>

wanted identifiers is loaded.<br>

<span class="HOEnZb"><font color="#888888"><br>

Peter<br>

</font></span></blockquote></div><br><br clear="all"><br>-- <br>Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/<br>distinti saluti/siong/duì yú/привет<br><br>Jurgens de Bruin

</div>