[Biopython] A possibility for speeding up FASTA/FASTQ reading in BioPython

Jones Kelly, Terence Carleton terence.jones at charite.de
Tue Nov 11 16:51:45 EST 2025


Hi all

I regularly process reasonably large FASTQ (hundreds of billions of sequencing reads) and FASTA files using BioPython. For some years I've been meaning to implement a FASTQ/FASTA reader in a compiled language and add Python bindings to improve the speed. I could've done this in C but I spent some decades writing C and I wanted to learn something new, so I considered a few languages. Because Rust makes it very easy to create Python bindings, I decided to give it a try. I thought I'd get going by asking the Claude CLI to write me some Rust. That turned out to be a much, much better experience than I had anticipated. With Claude I played with several implementations, keeping track of timing. Claude also wrote some tests. To compare what I was seeing I got Claude to write a pure Python version, a pure C version, Python bindings to the C, and to create a benchmark suite. From what I can tell, the Rust/Python (and the C/Python) FASTA reading is twice as fast as BioPython and FASTQ reading is four times as fast. I didn't write a single line of code. I just did some minimal cleaning up when things were already far along. I've been using the code for the last month or two with no problems.

The repo is at https://github.com/VirologyCharite/prseq  (prseq = Python/Rust for sequences). You'll find the benchmark results on that page.  There are still some small things I would adjust in the API.  BTW, Claude also wrote the README (which should definitely be improved).

I am wondering if there might be interest in incorporating this into BioPython. I don't know if there are any Rust dependencies in BioPython but I know that there are some C extensions. We could use either, as their speeds are comparable. If there's interest, I'd be happy to help (or to do it all, after some discussion and maybe with some guidance).

Thanks very much for all the work on BioPython. It's really been a pleasure to use the code over the last dozen years or so.

Terry Jones


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20251111/bff20f1c/attachment.htm>


More information about the Biopython mailing list