[Biopython-dev] Sequence object allows non-alphabet characters

Peter Cock p.j.a.cock at googlemail.com
Mon Dec 19 14:02:25 UTC 2011


On Mon, Dec 19, 2011 at 12:49 PM, Markus
<Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
>
> What about an additional optional option in the sequence object
> like "validate=True/False" with false as default. This would not
> break existing code, will not influence speed (if validate=False)
> but gives the possibility to have the sequence validated against
> the selected alphabet.

That could work, although there would still be trivial speed impact
(the extra if statement), but it shouldn't really hurt. However, most
Seq objects are created not directly by the user, but via SeqIO.
I suppose that could get another argument for Seq construction...

> In addition, validate=True without an selected alphabet would allow
> for a basic sequence polishing, like setting uppercase and removing
> whitespaces and digits (any non-alphabetic characters?).

Things like mixed case are actually useful, and so too are extra
symbols.

Peter



More information about the Biopython-dev mailing list