[BioPython] Making the Seq object act more like a string

Peter biopython at maubp.freeserve.co.uk
Mon Sep 10 08:27:18 UTC 2007


We seem to be talking at cross purposes.

Michiel de Hoon wrote:
> Peter wrote:
>> I would like to make the following "small" change now, ready for
>> the next release of Biopython:
>> 
>> (1) Make __str__ give the full sequence as a string for Seq and 
>> MutableSeq objects, allowing intuitive use of str(myseq) which used
>> to give a truncated representation including the alphabet.
> 
> Note that the __str__ is used to create the output of "print myseq",
>  where myseq is a Seq object. So if __str__ returns the full sequence
>  string, then "print myseq" will print the full sequence. This is not
>  necessarily what you want.

Getting the full string from both "print my_seq" and str(my_seq) is what
I would expect from a Seq object that acted like a string.

> In essence, the str() function and the .tostring() method have
> different functions. So I think we should not drop .tostring() in
> favor of str().

At the moment str() and .tostring() do serve purposes.  Currently with a 
Seq object called my_seq:
* full sequence as string - my_seq.tostring()
* representation with full sequence with alphabet - repr(my_seq)
* truncated sequence as string - not built in
* representation with truncated sequence with alphabet - str(my_seq)

What I would like:
* full sequence as string - str(my_seq) and retain my_seq.tostring() for 
backwards compatibility.
* representation with full sequence with alphabet - repr(my_seq)
* truncated sequence as string - not built in
* representation with truncated sequence with alphabet - consider added 
a new method e.g. my_seq.short()

> Moreover, this problem will go away if and when a Seq object
> subclasses from a string object. Then, we won't need a Seq-to-string
> function at all.

What do you mean by the "problem will go away"?  This would be much
easier to discuss in person :(

If/when we make Seq a subclass of string, there would still be __str__
and __repr__ methods, and I would expect str(my_seq) and also "print
my_seq" to give the full sequence.  For backwards compatibility I would
keep the existing .tostring() method as well.

I would find it very strange to have the Seq object subclass string, but 
doing str(my_seq) not give me the full sequence.  Isn't making 
str(my_seq) return the full sequence as a string is essential for things 
like this?:

print my_seq
print "My sequence is %s, length %i" % (my_seq, len(my_seq))

Rather than as currently required:

print my_seq.tostring()
print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq))


Peter




More information about the Biopython mailing list