[Biopython] Fw: Hiding alphabets

Peter Cock p.j.a.cock at googlemail.com
Tue Jul 3 13:39:59 UTC 2018


I confused myself, the default alphabet hiding *was*
merged and included in Biopython 1.72:

https://github.com/biopython/biopython/pull/1687

The link was a more invasive pull request always hides
the alphabet - which is sensible if we intend to remove
or completely replace this with something different:

https://github.com/biopython/biopython/pull/1676

Sorry for the confusion,

Peter

On Tue, Jul 3, 2018 at 2:31 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Thank you to everyone who has commented so far on the issue:
>
> https://github.com/biopython/biopython/issues/1674
>
> We do not have consensus on what to do with the alphabets
> as yet, and therefore if we should hide them or not.
>
> However, I am proposing to hide the default alphabet from the
> Seq objects' __repr__ as implemented here:
>
> https://github.com/biopython/biopython/pull/1676
>
> I originally suggested this for Biopython 1.72 (which is now out).
>
> Peter
>
> On Tue, Jun 5, 2018 at 12:19 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Dear Biopythoneers,
>>
>> I know that Michiel has long expressed a wish to get rid of
>> the current alphabet system in Biopython, and I agree that
>> the historic design is overly complicated and often gets in
>> the way - but we don't have any concrete proposals to replace
>> it. Part of the problem here is coming up with a replacement
>> with the least painful transition - and being practical the less
>> people use the alphabets, the less trouble any changes would
>> cause.
>>
>> The proposal here would de-emphasis the use of alphabets,
>> reflecting the fact that for the vast majority of scripts and
>> code you can just ignore them.
>>
>> There are still corner cases - for example, for some of the
>> SeqIO output filetypes we currently need to use the Seq's
>> alphabet to label the sequence type (RNA, DNA, protein).
>>
>> Still, overall I can see it being quite practical to downplay
>> the alphabet objects in our user facing documentation,
>> and hiding them in the Seq objects' __repr__ helps there.
>>
>> Is this a case where in the Zen of Python where practicality
>> wins out over being explicit about what a sequence object
>> contains?
>>
>> "Explicit is better than implicit.
>> ...
>> Although practicality beats purity."
>>
>> https://www.python.org/dev/peps/pep-0020/
>>
>> Thoughts and comments welcome here on on the issue,
>> https://github.com/biopython/biopython/issues/1674
>>
>> Peter
>>
>> On Sun, Jun 3, 2018 at 1:02 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>> Dear all,
>>>
>>> I have opened an issue here:
>>> https://github.com/biopython/biopython/issues/1674
>>> in case anybody has any comments or suggestions.
>>>
>>> Best,
>>> -Michiel
>>>
>>>
>>>
>>> On Saturday, May 26, 2018 11:50 PM, Michiel de Hoon <mjldehoon at yahoo.com>
>>> wrote:
>>>
>>>
>>> Dear all,
>>>
>>>
>>>
>>> In Biopython, Seq objects show both their sequence content and the alphabet
>>> associated with them.
>>> For example, the first example in our Biopython Tutorial & Cookbook starts
>>> as follows:
>>>
>>>>>> from Bio.Seq import Seq
>>>>>> my_seq = Seq("AGTACACTGGT")
>>>>>> my_seq
>>> Seq('AGTACACTGGT', Alphabet())
>>>
>>> I don't think we need to show the alphabet here. It takes up screen space,
>>> and oftentimes it's uninformative (as in the example above); the other
>>> examples in the same section of the tutorial show SingleLetterAlphabet and
>>> IUPACAmbiguousDNA. Even in the latter case, I don't think users need to be
>>> reminded every time that they are dealing with DNA.
>>>
>>> Perhaps more importantly, this is very confusing for new users. I would say
>>> that alphabets are of minor importance in Biopython overall. Some might say
>>> that they should be abolished altogether. But if we start off our tutorial
>>> by showing Alphabet, IUPAC.unambiguous_dna, SingleLetterAlphabet etc., then
>>> a reasonable question from students would be what they are and why we use
>>> them. I don't have a good answer to that question.
>>> In addition, the design of the Alphabet class is problematic.
>>>
>>> Shall we change the __repr__ function of Seq objects to show the sequence
>>> only? I.e. the example above would show
>>>
>>>>>> from Bio.Seq import Seq
>>>>>> my_seq = Seq("AGTACACTGGT")
>>>>>> my_seq
>>> Seq('AGTACACTGGT')
>>>
>>> Then the section on alphabets in the Tutorial can move to the end of the
>>> chapter, for people who actually want to use Alphabets.
>>>
>>> For each sequence object, the alphabet would still be accessible as the
>>> attribute to the Seq object:
>>>
>>>>>> my_seq.alphabet
>>> Alphabet()
>>>
>>>
>>> Best,
>>> -Michiel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list