<div dir="auto">On this topic, using an index or an alternative file format would be my first thought for speed. Any decent benchmarks for different access patterns / file / index formats out there?<div dir="auto"><br></div><div dir="auto"><br></div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, Nov 24, 2025, 9:41 AM Peter Cock <<a href="mailto:p.j.a.cock@googlemail.com">p.j.a.cock@googlemail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello Terry,<br>
<br>
I just posted a blog about my thoughts on receiving generative AI<br>
contributions as an Open Source project maintainer:<br>
<br>
<a href="https://blastedbio.blogspot.com/2025/11/thoughts-on-generative-ai-contributions.html" rel="noreferrer noreferrer" target="_blank">https://blastedbio.blogspot.com/2025/11/thoughts-on-generative-ai-contributions.html</a><br>
<br>
I am sceptical, and in this case adding a Rust dependency to Biopython<br>
seems too much to ask. I think you could get similar performance gains<br>
with C (which we do use) where at least the maintainers have some<br>
experience. However, even there, gains may not make the additional<br>
complexity and maintenance burden worthwhile.<br>
<br>
Thank you for writting and asking, rather than suprising everyone with<br>
a large pull request.<br>
<br>
Peter<br>
<br>
P.S. Cross reference <a href="https://github.com/biopython/biopython/pull/5085" rel="noreferrer noreferrer" target="_blank">https://github.com/biopython/biopython/pull/5085</a><br>
<br>
On Tue, Nov 11, 2025 at 10:00 PM Jones Kelly, Terence Carleton<br>
<<a href="mailto:terence.jones@charite.de" target="_blank" rel="noreferrer">terence.jones@charite.de</a>> wrote:<br>
><br>
> Hi all<br>
><br>
> I regularly process reasonably large FASTQ (hundreds of billions of sequencing reads) and FASTA files using BioPython. For some years I've been meaning to implement a FASTQ/FASTA reader in a compiled language and add Python bindings to improve the speed. I could've done this in C but I spent some decades writing C and I wanted to learn something new, so I considered a few languages. Because Rust makes it very easy to create Python bindings, I decided to give it a try. I thought I'd get going by asking the Claude CLI to write me some Rust. That turned out to be a much, much better experience than I had anticipated. With Claude I played with several implementations, keeping track of timing. Claude also wrote some tests. To compare what I was seeing I got Claude to write a pure Python version, a pure C version, Python bindings to the C, and to create a benchmark suite. From what I can tell, the Rust/Python (and the C/Python) FASTA reading is twice as fast as BioPython and FASTQ reading is four times as fast. I didn't write a single line of code. I just did some minimal cleaning up when things were already far along. I've been using the code for the last month or two with no problems.<br>
><br>
> The repo is at <a href="https://github.com/VirologyCharite/prseq" rel="noreferrer noreferrer" target="_blank">https://github.com/VirologyCharite/prseq</a> (prseq = Python/Rust for sequences). You'll find the benchmark results on that page. There are still some small things I would adjust in the API. BTW, Claude also wrote the README (which should definitely be improved).<br>
><br>
> I am wondering if there might be interest in incorporating this into BioPython. I don't know if there are any Rust dependencies in BioPython but I know that there are some C extensions. We could use either, as their speeds are comparable. If there's interest, I'd be happy to help (or to do it all, after some discussion and maybe with some guidance).<br>
><br>
> Thanks very much for all the work on BioPython. It's really been a pleasure to use the code over the last dozen years or so.<br>
><br>
> Terry Jones<br>
><br>
><br>
> _______________________________________________<br>
> Biopython mailing list - <a href="mailto:Biopython@biopython.org" target="_blank" rel="noreferrer">Biopython@biopython.org</a><br>
> <a href="https://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer noreferrer" target="_blank">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br>
_______________________________________________<br>
Biopython mailing list - <a href="mailto:Biopython@biopython.org" target="_blank" rel="noreferrer">Biopython@biopython.org</a><br>
<a href="https://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer noreferrer" target="_blank">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br>
</blockquote></div>