[Biopython-dev] [Biopython - Bug #2351] Make Seq more like a string, even subclass string?

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Sat Jun 30 06:34:05 UTC 2012


Issue #2351 has been updated by Michiel de Hoon.


>Does anything break if we do make Seq subclass string?
Obviously we'll have to make sure it passes the unit tests. But to catch the more insidious bugs, I'd suggest to modify the Biopython source code as soon as possible to let people try it. Since we just released 1.60, we now have the maximum amount of time to the next release, and we should use it to see if anything does break.

>Is this possible for MutableSeq?
Yes; we can use either a MutableString or, for Python version 2.6 and up, a bytearray.

>What about DBSeq (the lazy loading sequence object in BioSQL)?
DBSeq inherits from Seq,  but it doesn't actually use this inheritance. I don't think it could even make use the inheritance, since DBSeq objects don't have a self.data member. I think that this inheritance should be removed. Note that also MutableSeq does not inherit from Seq.

>Or UnknownSeq (which tries to avoid creating large repetitive strings in memory)?
Same here.

> What worries me however is a possible dichotomy between Seq-type objects
> which do and don't subclass strings.
We have to make sure that all Seq-like methods are equally applicable to all Seq-like objects. For DBSeq and UnknownSeq, subclassing does not make sense because of the potential performance penalty. However, this does not mean that we should not subclass for Seq either; we should simply make sure that DBSeq and UnknownSeq provide the functionality of strings in some other way (i.e., without subclassing). This can be done transparently for the user.

> Another potential example is memory efficient bit-encoded nucleotide sequences (BioJava has
> this). i.e. There are lots of Seq like objects where we do NOT want to have a big string
> buffer allocated in memory, and would that be required if we subclass string?
Here, we have the same issue as for DBSeq and UnknownSeq: since such memory-efficient Seq-like classes have to reimplement all Seq-like methods anyway, they should not inherit from the Seq class. 

> Also, for Python 3, we may want to consider sub classing byte string rather than the
> (unicode) string. However, with Python 3.3 the memory bloat problem of using Unicode even for > simple ASCII strings does go away.
I don't have a strong preference here.
----------------------------------------
Bug #2351: Make Seq more like a string, even subclass string?
https://redmine.open-bio.org/issues/2351

Author: Peter Cock
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: Not Applicable
URL: 


We've started talking on the mailing list about making the SeqRecord class a subclass of the Seq object, and making that a subclass of the Python string.

This bug is for holding patches - I suspect a lot of the discussion will continueon the mailing lists rather than here.

I explicitly have left the "assign to" field pointing at the dev mailinglist.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list