[Biopython] Generative AI policy for contributions to Biopython
Peter Cock
p.j.a.cock at googlemail.com
Fri Apr 24 06:16:09 EDT 2026
Dear Biopythoneers,
We need to set out a generative AI policy for contributions to Biopython.
There are now multiple recent PRs submitted by new contributors which
are openly using AI tools, more that I suspect are, and now even AI assisted
PRs from past contributors (where CV padding or other external metrics
are unlikely to be driving this). These are generally more work to review
than human written PRs, and that is a growing issue.
I blogged about my views late last year - ending in the line "Right now, I
still lean very much to saying no any PR using generative AI".
https://blastedbio.blogspot.com/2025/11/thoughts-on-generative-ai-contributions.html
Things will change (both tool capabilities, but also the social and legal
interpetations) but that post still describes my views today - note I did
not touch on the topic of communications there (see below).
Recently Linux adopted what has been described as a balanced stance
treating it as a tool with very clear expectations that usage MUST be declared
and that the human submitter is responsible for (quoting these four points):
* Reviewing all AI-generated code
* Ensuring compliance with licensing requirements
* Adding their own Signed-off-by tag to certify the DCO
* Taking full responsibility for the contribution
https://docs.kernel.org/process/coding-assistants.html
That is pragmatic but ignores the legal and ethical minefield. We don't
have a Developer Certificate of Origin (DCO), but I think the other
points are a bare minimum for any Biopython policy.
Most of my personal open source projects have only had a very small
number of contributors, and I am comfortable with outright rejecting
generative AI. I know some of the past/current Biopython contributors
are more willing to embrace this technology though - so I doubt support
for a simple ban would be unanimous.
Speaking for a moment as the current Open Bioinformatics Foundation
president, the board has discussed this and agreed not to try to micro
manage the member projects. For reference, BioPerl have started
https://github.com/bioperl/bioperl-live/issues/407 which has some
excellent points and examples to consider.
In particular, this is not just a code or documentation changes issue - but
also about the communication around any proposed change: the nature
of the commit messages, pull request description, and discussion. This
ties into the maintainers' burden - many of our recent AI generated PRs
have fairy short code changes but the verbose text is exhausting to read
and unhelpful. It has sometimes felt like I have been talking to an AI agent
rather than a human - I actually liked the feeling of mentoring a new
contributor and guiding them through minor hurdles to getting their
change accepted, but you lose that with an AI agent inbetween you.
I therefore very much like this line from the curreth Codeberg policy:
> All communication, that includes: commit messages, pull request
> messages, documentation, code comments and issues (and
> comments on issues/pull requests), that is intended to be read
> by people to understand your thoughts and work must not have
> been generated with AI. We exclude machine translation and
> tooling that helps with grammar and spelling check.
https://codeberg.org/comaps/Governance/src/branch/main/AI_USAGE.md
Would anyone like to speak in defence of accepting AI (assisted) PRs,
and suggest an existing policy you would be happy we adopt or base
ours on?
Or should I start drafting a more draconian but likely much shorter one -
a few lines like this in the CONTRIBUTING file and/or PR template: No
generative AI to be used in any Biopython contributions, with the exception
of machine translation to/from English (where you might consider including
your original language text as well).
Thank you,
Peter
More information about the Biopython
mailing list