[Biopython-dev] SeqFeature start/end and making positions act like ints

Peter Cock p.j.a.cock at googlemail.com
Mon Sep 19 09:03:59 UTC 2011


On Sat, Sep 17, 2011 at 8:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Sat, Sep 17, 2011 at 2:44 PM, Eric Talevich wrote:
>> On Fri, Sep 16, 2011 at 7:01 PM, Peter Cock wrote:
>>>
>>> On Fri, Sep 16, 2011 at 9:33 PM, Eric Talevich wrote:
>>> >
>>> > Think that would work?
>>>
>>> Yes - in fact I've done that on another branch but with to avoid
>>> circular imports used hasattr(index, "extract") instead. It solves
>>> a different problem to making start/end easier to use.
>>
>> OK, you're way ahead of me.

The actual commit wasn't that far ahead of you:
https://github.com/peterjc/biopython/commit/db4553c7e0bcb8a7eca137aeb24d713d9bf9dd93

> Well, I've been thinking about this on and off for a while now.
> One issue with the __getitem__ trick is what would we do for
> the SeqRecord when sliced with a SeqFeature? Should it use
> the id and annotation from the SeqFeature or the SeqRecord?

This needs some thought.

>> The new start/end properties you implemented
>> look good to me, and I doubt there would be a serious hit
>> to performance -- plus, code that didn't need these shortcuts
>> don't have to use them.
>
> Good. I've realised I need to double check the integer
> methods (equals, sorting, hashes etc), but they should
> be fine.

Thinking about this more, the current _shift method of
the position objects (used in SeqRecord slicing) would
make sense as the __add__ method, thus:

BeforePosition(5) + 10 --> BeforePosition(15)

rather than currently:

BeforePosition(5)._shift(10) --> BeforePosition(15)

However, perhaps that is just making work for ourselves,
we'd have to implement code for all the mixture cases, e.g.

BeforePosition(5) + AfterPosition(10) --> UncertainPosition(15)

>> These will be handy for writing code that visualizes
>> SeqFeatures, too.
>
> Well, slightly easier - I have some more dramatic changes to
> the SeqFeature and FeatureLocation objects planned, but I'm
> still playing with this.

One of the key changes (which can be done without
really changing the API) is to move the database &
accession and the strand from the SeqFeature to the
FeatureLocation. These are intimately connected with
the location, as much as the start/end.

This is one of the things I've been working on here:
https://github.com/peterjc/biopython/commits/f_loc

The other key change on that experimental branch
is moving away from sub_features for join locations
(etc). Here I was trying a new CoupoundLocation
object, but am still wondering if this should be done
in the SeqFeature or FeatureLocation object instead
(or if SeqFeature should subclass FeatureLocation).

Peter



More information about the Biopython-dev mailing list