[Biopython-dev] Tweaking the SeqRecord class

Thu Aug 17 08:28:19 UTC 2006

Michiel de Hoon wrote:
> Peter wrote:
> 
>>First of all, is there any comment on my suggestion to add __str__ and
>>__repr__ methods to the SeqRecord object, bug 2057:
>>
>>http://bugzilla.open-bio.org/show_bug.cgi?id=2057
> 
> Here's a thought:
> What if Seq were to inherit from str, and SeqRecord from Seq?
> Then, you get these for free.

This wouldn't automatically show any id/name/desrc/annotation in the
__str__ and __repr__ methods, so I would want to override these methods
anyway.

We would still need to create and provide a Seq object on request as the
record.seq attribute/property (for backwards compatibility).

I also think we should change the Seq objects __str__, __repr__
functionality (while preserving the .tostring() method for some
backwards compatibility).  It might have been Marc the raised this point
- shouldn't __str__ turn the data into a string, and __repr__ return a
string that you could type into python to recreate the object?  This
would mean we would have to stop truncating the sequence data at 60
characters.

>>Next, I'd like to check in some basic __doc__ strings for the
>>SeqRecord class, e.g. something like this:
> 
> Sounds good to me. Pretty amazing, actually, that SeqRecord doesn't have 
> documentation.

OK, basic __doc__ strings checked in,  Bio/SeqRecord.py revision 1.9

The Seq object also needs some love and attention in this area.

>>If you recall, for the fastest parsers turning the data into SeqRecord
>>and Seq objects imposed a fairly large overhead (compared to just
>>using strings):
>>
>>http://lists.open-bio.org/pipermail/biopython-dev/2006-July/002407.html
> 
> I wonder if this is still true if a Seq object and a SeqRecord object 
> inherit from string. From the code, I don't see where the overhead comes 
> from.

I was wondering what the overhead was too.

It could just be creating objects (Seq and SeqRecord) plus their
associated strings/list/dictionary (compared with just two strings, the
fasta title string and the sequence).

My property change should reduce this a little bit as for Fasta files
there is no need to create the dbxrefs list or the annotations
dictionary (unless or until the user records some information here after
creating the SeqRecord object).

Making SeqRecord subclass Seq might help here if only one object needs
to be created.

>>The backwards compatibility if statement is a bit
>>ugly - can we just assume Python 2.2 or later?
> 
> Biopython currently requires Python 2.3 or later.

Great - I'll ditch that nasty big if and just re-write the class to use
properties.

Revised version attached - should be functionally identical.

Peter

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: SeqRecord.py
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20060817/79dd5fca/attachment.ksh>