<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body>
<div style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi all</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div><span style="font-size: 16px;">I regularly process reasonably large FASTQ (hundreds of billions of sequencing reads) and FASTA files using BioPython. For some years I've been meaning to implement a FASTQ/FASTA reader in a compiled language and add Python
bindings to improve the speed. I could've done this in C but I spent some decades writing C and I wanted to learn something new, so I considered a few languages. Because Rust makes it very easy to create Python bindings, I decided to give it a try. I thought
I'd get going by asking the Claude CLI to write me some Rust. That turned out to be a much, much better experience than I had anticipated. With Claude I played with several implementations, keeping track of timing. Claude
</span>also<span style="font-size: 16px;"> wrote some tests. To compare what I was seeing I got Claude to write a pure Python version, a pure C version, Python bindings to the C, and to create a benchmark suite. From what I can tell, the Rust/Python (and the
C/Python) FASTA reading is twice as fast as BioPython and FASTQ reading is four times as fast.
</span><span style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">I didn't write a single line of code. I just did some minimal cleaning up when things were already far along.</span><span style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> I've
been using the code for the last month or two with no problems.</span></div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
The repo is at <a href="https://github.com/VirologyCharite/prseq" data-outlook-id="b95749ed-030e-4407-8496-334c7f335e75">
https://github.com/VirologyCharite/prseq</a> (prseq = Python/Rust for sequences). You'll find the benchmark results on that page. There are still some small things I would adjust in the API. BTW, Claude also wrote the README (which should definitely be improved).</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);">I am wondering if there might be interest in incorporating this into BioPython. I don't know if there are any Rust dependencies in BioPython but I know that there are some C extensions. We could use either,
as their speeds are comparable. </span><span style="font-size: 12pt;">If there's interest, I'd be happy to help (or to do it all, after some discussion and maybe with some guidance).</span></div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks very much for all the work on BioPython. It's really been a pleasure to use the code over the last dozen years or so.</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Terry Jones</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
</body>
</html>