<div dir="ltr"><div>Hi,<br><br></div>Thanks for the reply I am trying out the Bio.SeqIO.FastaIO.SimpleFastaParser, what I want to achieve is to iterate over the fasta and pull out sequences that are in a predefined list, based on id and then write these to a new fasta file.<br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 9 September 2014 11:38, Peter Cock <span dir="ltr"><<a href="mailto:p.j.a.cock@googlemail.com" target="_blank">p.j.a.cock@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Sep 9, 2014 at 8:54 AM, Jurgens de Bruin <<a href="mailto:debruinjj@gmail.com">debruinjj@gmail.com</a>> wrote:<br>
> Hi All,<br>
><br>
> I would like some advice on iterating over large fasta files 208MB total of<br>
> 1813132 sequences. Currently using SeqIO.parse but seems very very slow. I<br>
> would appreciate any help on this matter.<br>
<br>
</span>Do you need to look at each record one-by-one? If so, iterating over<br>
the file in one pass is best, and if Bio.SeqIO.pase(..., "fasta") is too<br>
slow then I suggest using Bio.SeqIO.FastaIO.SimpleFastaParser(...)<br>
which just returns tuples of strings (avoiding the memory and speed<br>
overhead of creating SeqRecord objects).<br>
<br>
Alternatively, it might be more efficient to jump to specific records<br>
of interest using Bio.SeqIO.index(..) or Bio.SeqIO.index_db(...).<br>
<span class="HOEnZb"><font color="#888888"><br>
Peter<br>
</font></span></blockquote></div><br><br clear="all"><br>-- <br>Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/<br>distinti saluti/siong/duì yú/привет<br><br>Jurgens de Bruin
</div>