[Biopython-dev] sequence class proposal

Jose Blanca jblanca at btc.upv.es
Thu May 22 07:30:52 UTC 2008


Dear Biopython developers,
I've been using python and Biopython for some time now and I would like to 
talk with you about the sequence classes in Biopython. I have had some issues 
using the SeqRecord and Alignment classes and I have being discussing and 
implementing with two students (Victor Sanchez y Pablo Martinez) a proposal 
of a new sequence class. We would like to present this implementation as a 
tip in the discussion about the design of the sequence classes in Biopython 
and we're eager to receive your comments.

The first problem that I found with the SeqRecord is the lack of support for 
qualities. And it is also difficult to implement this quality support in a 
SeqRecord derived class. There's a problem with the current SeqRecord API 
that difficults this. Let me explain it.
Currently SeqRecord has a seq property and if you want an slice or if you need 
to reverse or complement you would do something like:
my_seq = SeqRecord()
my_seq.seq = Seq('ACTG')
my_seq.seq[0:2]
my_seq.seq = my_seq.reverse()
If I derive a class from SeqRecord with a qual property I don't know how to 
reverse both the sequence and the quality at the same time, because now the 
Seq methods are called directly without SeqRecord being aware of that. In 
order to support that we have discuss a new class with a slightly different 
API and we have done a preliminary implementation. We have named this new 
class as RichSeq, and we think that this could solve the quality problem. 
With this new class it would work like this:
myseq = RichSeq(seq='ACTG', qual=[50,50,50,50])
subseq = myseq[0:2]
myseq.reverse()
myseq.complement()
RichSeq is equivalent to SeqRecord and it has the same properties as 
SeqRecord, but it adds the methods __getitem__, reverse, complement and 
reverse_complement.

We have also implemented a new type of features, we have called them 
RichFeature. They are similar to the SeqFeature. The main difference is that 
instead of a location and a location operator, they have a BioRange (another 
new class). This BioRange is inspired/copied from the Bioperl library. The 
BioRange is optional, so some RichFeature uses would be:
RichFeature(id='a_feature', type='annotation', feature='this is an 
annotation')
RichFeature(id='a_feature', type='subsequence', feature=Seq('ACTG'))
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation, 
e.g. an exon')
seq = RichSeq(seq='ACTGACTG', features=[feat])

With this implementation you can define a sequence with seq, qual and 
annotations associated with a range in a easy way, and after that you can 
reverse and complement them in a trivial way.
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation')
seq = RichSeq(seq='ACTGACTG', qual=[60,60,60,60,60,60,60,60], features=[feat])
seq.reverse()

By the way, this is a mutable class, although that could be easily changed.

You can even use Seqs and RichSeq as subsequences and ask for slices or 
complements.
range = BioRange(start=1,end=2)
feat = RichFeature(type='subsequence', feature=RichSeq(seq='CT'), range=range)
seq = RichSeq(seq='ACTG', features=[feat])
seq2 = seq[1:2]
seq.reverse()
This capability makes this RichSeq an excellent candidate for a base class for 
an Alignment implementation, but we have not implemented this yet.

Attach to this mail you can find the implementation of this new classes. They 
have some tests that provide an idea about their intended use. We would like 
to know about your opinions and suggestions. Do you think that this kind of 
functionality is desirable? Please let us know about any flaw, specially in 
the API. I think that my work would be easier using a sequence class similar 
to RichSeq, but maybe there's an easier way.
Do you think that is a good idea to attach this classes to bugzilla? Do we 
open a new bug or there's one for this sequence class debate already open?
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: richseq.0.0.1.tar.gz
Type: application/x-tgz
Size: 7075 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20080522/aba24889/attachment-0002.bin>


More information about the Biopython-dev mailing list