[Biopython-dev] slicing in Bio.PDB.Chain.__getitem__() ?

Hongbo Zhu 朱宏博 macrozhu at gmail.com
Mon Dec 5 16:22:42 UTC 2011


>
>
> > PDB entry 1h4w is a good example with icode and the sequence of chain A
> > starts with resnum 16.
>
> That shows the problem nicely,
>
> >>> from Bio import PDB
> >>> structure = PDB.PDBParser().get_structure("1h4w", "1h4w.pdb")
> >>> chain = structure[0]['A']
> >>> len(chain)
> 351
> >>> chain[0]
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "Bio/PDB/Chain.py", line 67, in __getitem__
>    return Entity.__getitem__(self, id)
>  File "Bio/PDB/Entity.py", line 38, in __getitem__
>    return self.child_dict[id]
> KeyError: (' ', 0, ' ')
>
> However, you can access the first residue like this:
>
> >>> chain[16]
> <Residue ILE het=  resseq=16 icode= >
>
> Likewise,
>
> >>> for index, residue in enumerate(chain):
> ...     print index, residue
> ...     assert chain[index] == residue
> ...
> 0 <Residue ILE het=  resseq=16 icode= >
> Traceback (most recent call last):
>  File "<stdin>", line 3, in <module>
>  File "Bio/PDB/Chain.py", line 67, in __getitem__
>    return Entity.__getitem__(self, id)
>  File "Bio/PDB/Entity.py", line 38, in __getitem__
>    return self.child_dict[id]
> KeyError: (' ', 0, ' ')
>
> So as you say, the current implementation does map
> an integer index to the middle field of the ID tuple,
> rather than the position in the list as I had assumed.
> Sadly this means it is incompatible with Pythonic
> slicing, so we can't extend __getitem__ to offer that.
>
> Interesting! I was thinking of the problem from a different angle: slicing
is just a natural extension from __getitem__, like in pythonic list. And I
think the current implementation is a great realization of pythonic list in
the special case of protein chain.

But since my proposal has another conflict with pythonic slicing, i.e., the
ambiguity about the ending position of  sequence segments, I prefer to
implement the slicing as an independent function get_slice(start, end), if
not in Bio.PDB.Chain, then in my own code.

Thanks a lot for the helpful discussion!


-- 
Hongbo



More information about the Biopython-dev mailing list