[Biopython-dev] SeqFeature start/end and making positions act like ints

Peter Cock p.j.a.cock at googlemail.com
Fri Sep 16 23:01:18 UTC 2011


On Fri, Sep 16, 2011 at 9:33 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Fri, Sep 16, 2011 at 12:31 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Hi all,
>>
>> We've previously discussed adding start/end properties
>> to the SeqFeature returning integers - which would be
>> useful but inconsistent with the FeatureLocation which
>> returns Position objects:
>>
>> https://redmine.open-bio.org/issues/2818
>>
>> After an interesting discussion with Leighton, I spent
>> the afternoon making (most of the) Position objects
>> subclass int - so that they can be used like integers
>> (with the fuzzy information retained but generally
>> ignored except for writing the features out again).
>>
>> This means we can have SeqFeature start/end
>> properties which like those of the FeatureLocation
>> return position objects - and they are actually easy
>> to use (except for some very extreme cases).
>> e.g. You can use them to slice a sequence.
>>
>> The code is on a branch here:
>> https://github.com/peterjc/biopython/tree/int_pos
>>
>> It is almost 100% backwards compatible. Some
>> of the arguments for creating a fuzzy position
>> (and their __repr__) have changed, and some
>> of their attributes, but we feel this is unlikely to
>> actually affect anyone. We rather suspect only
>> the SeqIO parsers actually create or use the
>> fuzzy objects in the first place!
>>
>> In terms of usability I think this is a worthwhile
>> improvement. The new class heirachy is a bit
>> more complex though - and I have not looked
>> at the performance implications at all.
>>
>> Would anyone like to review this please?
>>
>
> Here's another way to do it, maybe -- modify Seq.Seq.__getitem__ to also
> check if it's been given a SeqFeature, and if so, handle the joins there.
> The handling of fuzziness could happen in here or use the new .start and
> .end properties.
>
> Outline:
>
>     def __getitem__(self, index):
>         """Returns a subsequence of single letter, use my_seq[index]."""
>         if isinstance(index, int):
>             #Return a single letter as a string
>             return self._data[index]
>         elif isinstance(index, SeqFeature):
>             # NEW -- handle start/end/join voodoo safely
>             # if there's a join, extract the subsequences and then
> concatenate them
>             return the_result
>         else:
>             #Return the (sub)sequence as another Seq object
>             return Seq(self._data[index], self.alphabet)
>
>
> Think that would work?

Yes - in fact I've done that on another branch but with to avoid
circular imports used hasattr(index, "extract") instead. It solves
a different problem to making start/end easier to use.

Peter




More information about the Biopython-dev mailing list