<div dir="ltr"><div>Hi,<br><br></div>Thanks for the reply I am trying out the Bio.SeqIO.FastaIO.SimpleFastaParser, what I want to achieve is to iterate over the fasta and pull out sequences that are in a predefined list, based on id and then write these to a new fasta file.<br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 9 September 2014 11:38, Peter Cock <span dir="ltr">&lt;<a href="mailto:p.j.a.cock@googlemail.com" target="_blank">p.j.a.cock@googlemail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Sep 9, 2014 at 8:54 AM, Jurgens de Bruin &lt;<a href="mailto:debruinjj@gmail.com">debruinjj@gmail.com</a>&gt; wrote:<br>

&gt; Hi All,<br>

&gt;<br>

&gt; I would like some advice on iterating over large fasta files 208MBÂ  total of<br>

&gt; 1813132 sequences.Â  Currently using SeqIO.parse but seems very very slow. I<br>

&gt; would appreciate any help on this matter.<br>

<br>

</span>Do you need to look at each record one-by-one? If so, iterating over<br>

the file in one pass is best, and if Bio.SeqIO.pase(..., &quot;fasta&quot;) is too<br>

slow then I suggest using Bio.SeqIO.FastaIO.SimpleFastaParser(...)<br>

which just returns tuples of strings (avoiding the memory and speed<br>

overhead of creating SeqRecord objects).<br>

<br>

Alternatively, it might be more efficient to jump to specific records<br>

of interest using Bio.SeqIO.index(..) or Bio.SeqIO.index_db(...).<br>

<span class="HOEnZb"><font color="#888888"><br>

Peter<br>

</font></span></blockquote></div><br><br clear="all"><br>-- <br>Regards/Groete/Mit freundlichen GrÃ¼ÃŸen/recuerdos/meilleures salutations/<br>distinti saluti/siong/duÃ¬ yÃº/Ð¿Ñ€Ð¸Ð²ÐµÑ‚<br><br>Jurgens de Bruin

</div>