[Biopython-dev] SeqFeature's FeatureLocation for GenBank

Peter biopython-dev at maubp.freeserve.co.uk
Thu Nov 3 05:38:38 EST 2005


Marc Colosimo wrote:
> I want to point out the very bizarre behavior of FeatureLocations when 
> using GenBank.FeatureParser (well to me anyways).

Its by design...

> When I was testing out some code, I noticed that the start positions 
> were 1 less that in the GenBank Record, but the end positions were 
> correct. My first thought was that this must be a bug and such went 
> looking for it. I soon gave up because I just don't have the time to 
> understand all the code that is involved (I was going to file a bug 
> report). So, I just added 1 to the start positions and went on to get 
> the features from the DNA. Suddenly I now understand why the positions 
> were like that: slicing!

Exactly, e.g. something like:

seq[feature.location.start.position:feature.location.end.position]

> Unless I missed something, I didn't see anything talking about this 
> behavior.

Python (like C) starts counting at zero, and this behaviour is 
deliberate to make handling of the BioPython sequence objects as easy as 
possible.  Why - because the biopython DNA/RNA/Proteins sequences are as 
much like Python strings as possible.

For example, to extract letters the 5 to 7 from "abcdefghijk" (using one 
based counting, i.e. "efg") in Python you say "abcdefghijk"[4:7]

Suppose your gene is bases 150..300 (using one based counting as in a 
GenBank file).

To extract this from the full DNA sequence, you would use something 
like: fullsequence[149:300]

I suppose the CookBook may have assumed people were familiar with Python 
strings already...

 > Is this consistent with other parsers? If so, I would suggest
> that this is included in the Cookbook ...

It should be consistent with other parsers.  Would you be able to 
suggest some rewording of the CookBook to clarify this?

(I'm sure I have seen a similar question on the mailing list in the 
past, so something could be improved)

> ... and that the classes are modified so that when printed (__str__)
 > reports 1 instead of 0 (basically +1).

That would be bad for people using the existing behaviour.

You'll get used to it (especially if you have to switch between zero 
based and one based languages).

Peter


More information about the Biopython-dev mailing list