<html><head></head><body><div class="ydp57a6f215yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:10px;"><div></div>
<div dir="ltr" data-setdir="false">Dear Terry,</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Thank you for contributing this code. A faster Fasta/Fastq parser would be very welcome.</div><div dir="ltr" data-setdir="false">Have you done any timings to find out why the current Fasta parser is slower?</div><div dir="ltr" data-setdir="false">My guess is that this is because of the SeqRecord constructor, which can create complex SeqRecord objects with annotations, letter_annotations and whatnot, while Fasta only needs the id attribute.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Thanks,</div><div dir="ltr" data-setdir="false">-Michiel</div>
</div><div id="yahoo_quoted_4006301043" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
On Wednesday, November 12, 2025 at 11:49:37 AM GMT+9, Jones Kelly, Terence Carleton <terence.jones@charite.de> wrote:
</div>
<div><br></div>
<div><br></div>
<div><div id="yiv0139152588">
<div>
<div style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
Hi all</div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
<br>
</div>
<div><span style="font-size:16px;">I regularly process reasonably large FASTQ (hundreds of billions of sequencing reads) and FASTA files using BioPython. For some years I've been meaning to implement a FASTQ/FASTA reader in a compiled language and add Python
bindings to improve the speed. I could've done this in C but I spent some decades writing C and I wanted to learn something new, so I considered a few languages. Because Rust makes it very easy to create Python bindings, I decided to give it a try. I thought
I'd get going by asking the Claude CLI to write me some Rust. That turned out to be a much, much better experience than I had anticipated. With Claude I played with several implementations, keeping track of timing. Claude
</span>also<span style="font-size:16px;"> wrote some tests. To compare what I was seeing I got Claude to write a pure Python version, a pure C version, Python bindings to the C, and to create a benchmark suite. From what I can tell, the Rust/Python (and the
C/Python) FASTA reading is twice as fast as BioPython and FASTQ reading is four times as fast.
</span><span style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:16px;color:rgb(0, 0, 0);background-color:rgb(255, 255, 255);">I didn't write a single line of code. I just did some minimal cleaning up when things were already far along.</span><span style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);"> I've
been using the code for the last month or two with no problems.</span></div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
<br>
</div>
<div style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
The repo is at <a rel="nofollow noopener noreferrer" target="_blank" href="https://github.com/VirologyCharite/prseq">
https://github.com/VirologyCharite/prseq</a> (prseq = Python/Rust for sequences). You'll find the benchmark results on that page. There are still some small things I would adjust in the API. BTW, Claude also wrote the README (which should definitely be improved).</div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:16px;color:rgb(0, 0, 0);">
<span style="background-color:rgb(255, 255, 255);">I am wondering if there might be interest in incorporating this into BioPython. I don't know if there are any Rust dependencies in BioPython but I know that there are some C extensions. We could use either,
as their speeds are comparable. </span><span style="font-size:12pt;">If there's interest, I'd be happy to help (or to do it all, after some discussion and maybe with some guidance).</span></div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
<br>
</div>
<div style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
Thanks very much for all the work on BioPython. It's really been a pleasure to use the code over the last dozen years or so.</div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
<br>
</div>
<div style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
Terry Jones</div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family:Aptos, Arial, Helvetica, sans-serif;font-size:12pt;color:rgb(0, 0, 0);">
<br>
</div>
</div>
</div>_______________________________________________<br>Biopython mailing list - <a ymailto="mailto:Biopython@biopython.org" href="mailto:Biopython@biopython.org">Biopython@biopython.org</a><br><a href="https://mailman.open-bio.org/mailman/listinfo/biopython" target="_blank">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br></div>
</div>
</div></body></html>