[Biopython-dev] Fwd: Re: sequence class proposal
biopython at maubp.freeserve.co.uk
Thu Jun 5 09:17:00 UTC 2008
This is in reply to Jose's comment 3 on bug 2507, which was quite broad.
> I have coded a sequence class that fullfils the requirements that I
> would like to see. It's very similar to SeqRecord, but it is not compatible
> with it. It has no seq property, although that can be solved. The problem
> with SeqRecord is that it is not possible to create a class with an __init__
> compatible with Seq and SeqRecord at the same time.
Even if one day the SeqRecord is a subclass of the Seq object, there
is no requirement that it have the same __init__ arguments. In fact,
have to be different because for a SeqRecord you should also supply an
identifier (and potentially a name, description and other annotation).
> This proposed class is just a draft, it needs more work but I would like to
> receive comments about it. It inherits from MutableSeq so it should be
> named MutableRichSeq, but it seems that I'm too lazy to such a long name,
> I promise to change the name in a later version and to create a RichSeq
> with Seq as parent.
I agree with you here that when getting a single letter (amino acid or
nucleotide) from a sequence with per-letter-annotation, e.g.
my_sequence, it would be very nice to have the
per-letter-annotation like the quality included. This does mean the
object returned can't just be a single one character string. However,
because the current Seq and MutableSeq classes return a simple string,
unless we return a subclass of a string, this risks breaking other
peoples code. So, I would conclude that Seq needs to subclass a
string BEFORE we start including support for per-letter-annotation.
Ideally we would have alphabet aware versions of all the string
functions before we made this change (see Bug 2351).
> Besides RichSeq there is in the attachment two other classes, RichFeature
> and BioRange, but I would comment on that in another post.
Your BioRange and BioFeature classes seem somewhat similar to the
current SeqFeature class with its locations (and sub features).
> I think that it is quite important to convert Seq and MutableSeq to newclasses,
> what do you think about that? With the new classes we can use properties.
I have been thinking about deprecating the Seq.data property (and also
the MutableSeq). The data string (or array) should really be a
private implementation detail, perhaps Seq._data following the
underscore for private convention. We can then add property methods
to make the Seq.data available (perhaps with a deprecation warning).
More information about the Biopython-dev