[Biopython-dev] slicing in Bio.PDB.Chain.__getitem__() ?

Hongbo Zhu 朱宏博 macrozhu at gmail.com
Mon Dec 5 13:38:09 UTC 2011


> But in CATH and SCOP, sequence segments composing domains

> > are given as start and end position. And the residue at the end
> > position is also included in the domain definition.
>
> OK. I'd have to double check what our parsers return (and if
> they convert the start/end into C/Python style).
>
> > e.g. if a domain
> > is defined to be from residue (' ', 1, ' ') to residue (' ', 40, ' '), a
> slicing
> > like this mychain[(' ', 2, ' '): (' ', 40, ' ')] or mychain[2:40] would
> not
> > include residue (' ',40,' ').
>
> Perhaps I misunderstood - I would not want to allow the syntax
> mychain[(' ', 2, ' '): (' ', 40, ' ')] which is unclear, rather only allow
> the user to use mychain[2:41] which requires Python counting.
>
>
But even in mychain[2:41], the 2 and 41 should be residue sequence number.
Then it is consistent with the current acceptable syntax mychain[2], where
2 also refers to a sequence number. At the moment, BioPython also
accepts mychain[(' ', 2, ' ')]. So I think mychain[(' ', 2, ' '): (' ', 40,
' ')] would be just a nature extension of mychain[(' ', 2, ' ')].

According to the source code, mychain[2] is considered an abbreviation of
mychain[(' ', 2, ' ')]. Internally, mychain[2] will be translated to
mychain[(' ', 2, ' ')] by function Bio.PDB.Chain.__translate_id(). So if
mychain[2:4] would be allowed, internally it would also
be first translated to mychain[(' ', 2, ' '): (' ', 40, ' ')]. So in my
point of view, mychain[2:4] is just an abbreviation for mychain[(' ', 2, '
'): (' ', 40, ' ')], just like mychain[2] is a short version of mychain[('
',2,' ')].

hongbo



> Peter
>



-- 
Hongbo



More information about the Biopython-dev mailing list