<div dir="ltr">Thanks for all the help much appreciated!<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 9 September 2014 15:04, Peter Cock <span dir="ltr"><<a href="mailto:p.j.a.cock@googlemail.com" target="_blank">p.j.a.cock@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Sep 9, 2014 at 1:55 PM, Jurgens de Bruin <<a href="mailto:debruinjj@gmail.com">debruinjj@gmail.com</a>> wrote:<br>
> Hi,<br>
><br>
</span><span class="">> So the id I am matching to are in a set .<br>
<br>
</span>Good :)<br>
<span class=""><br>
> if <a href="http://seq.id" target="_blank">seq.id</a> in lset_id:<br>
> list_seq.append(seq)<br>
<br>
</span>This looks like you are building a list of SeqRecord object in memory.<br>
If you are looking for a large number of entries in the FASTA file, this<br>
will consume a lot of RAM (and if you run out or RAM will suddenly<br>
slow down as swap space is used instead).<br>
<br>
I would use a generator approach to write out the records you want<br>
immediately, see the "Filtering a sequence file" example in the<br>
Cookbook chapter of the Biopython Tutorial:<br>
<br>
<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html" target="_blank">http://biopython.org/DIST/docs/tutorial/Tutorial.html</a><br>
<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.pdf" target="_blank">http://biopython.org/DIST/docs/tutorial/Tutorial.pdf</a><br>
<br>
In your case, replace "sff" with "fasta" and adjust how the set of<br>
wanted identifiers is loaded.<br>
<span class="HOEnZb"><font color="#888888"><br>
Peter<br>
</font></span></blockquote></div><br><br clear="all"><br>-- <br>Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/<br>distinti saluti/siong/duì yú/привет<br><br>Jurgens de Bruin
</div>