[Biopython] Biopython Enhancement Proposal (BEP): Alphabets

T.A. Wemyss taw50 at cam.ac.uk
Tue Oct 16 09:02:00 UTC 2018


Dear all,

Apologies for the second email, but Michiel and I felt it would be 
useful to explain my reasoning behind leaving the BEP process. I started 
off in favour of alphabets, but have since become converted to 
supporting their removal.

Here is some background on my motivation for this:
- Alphabets have been in Biopython at least since version 1.00a3 
(September 3, 2001)
- Implementation is inconsistent (see MutableSeq, 
https://github.com/biopython/biopython/issues/1681 )
- Their purpose is badly defined and their current implementation does 
not clarify this. Therefore any new implementation is likely to cause 
breaking changes for the few people who actually use them.
- On the entire mailing list, only one person replied to say they used 
alphabets - it's clearly not a widely used feature, and risks just being 
an additional source of confusion.

Michiel has suggested that we proceed directly to removing Alphabets if 
nobody else wants to take over the BEP.

All the best,
Thomas

On 2018-08-04 03:04, Michiel de Hoon wrote:
> Dear all,
> 
> While sequence objects in Biopython have an associated alphabet, the
> purpose of alphabets in Biopython is currently not well-defined.
> I can imagine these three interpretations of their purpose:
> 
> 	* To define how the sequence data is stored internally in a Seq
> object (i.e. what kind of objects are in seq.data);
> 	* To define conceptually what the Seq object contains (e.g. this is a
> protein, or this is DNA, or this is DNA with or without methylation);
> 	* To define how a Seq object should be presented to the user (e.g. as
> a single-letter string, a three-letter string, or something else).
> 
> (and there may be others that I have overlooked).
> 
> To justify having alphabets as a part of Biopython, their purpose
> should be clearly defined.
> 
> Because of the complexity of alphabets and their use in Biopython, we
> felt that it may be a good idea to have a PEP (Python Enhancement
> Proposal)-like discussion to define the purpose of alphabets and their
> technical implementation in Biopython. This would mean that somebody
> who is in favor of having alphabets in Biopython would work out a
> proposal with all the details to allow developers and users to think
> through the implications.
> 
> Here you can find a description of PEPs and what should go in them:
> https://www.python.org/dev/peps/pep-0001/ [1]
> 
> Not all of it is applicable to Biopython, but it may serve as a
> general guideline.
> 
> The Alphabet BEP (Biopython Enhancement Proposal) could be hosted on
> the Biopython website so that everybody can follow the discussion.
> 
> Since alphabets have been under discussion for more than 10 years, we
> are thinking to put a time limit to the proposal (e.g., until January
> 1st, 2020), meaning that if no agreement on the proposal is reached by
> then, alphabets would be removed from Biopython. This would give
> people who are in favor of alphabets to make their case, while
> guaranteeing that a conclusion will be reached (either a well-defined
> and usable alphabet, or no alphabet) within the next ~1.5 years.
> 
> Any volunteers? Seq objects and therefore their alphabets are a key
> feature of Biopython, and working through a BEP can give you the
> opportunity to help design a major part of Biopython.
> 
> Best,
> -Michiel
> 
> 
> 
> Links:
> ------
> [1] https://www.python.org/dev/peps/pep-0001/
> 
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list