[BioRuby] bio.pdb doubt

Alex Gutteridge alexg at ruggedtextile.com
Thu Feb 21 11:08:08 UTC 2008


On 21 Feb 2008, at 10:27, K. Shameer wrote:

> Alex,
>
>> I shouldn't have posted code without testing first!
>
> :)
>
>>
>> The problem is that the PDB parser reads the solvent (water)  
>> molecules
>> into a separate chain. So in this case we have the protein chain and
>> the water 'chain'. My naive multichain? method then reports you have
>> two chains.
>
> Is this something unusual ? In a structural bioinformatics scenario
> solvent/water belongs to the HETATM definition. I am not able  
> understand
> the logic behind the consideration of a ATOM records as well as HETATM
> records as part of chain.

It's a cludge, no doubt about that. There may well be a better  
solution but it's also not trivial. A couple of problems come up in  
practice:

1. How do you know what the solvent is? In 99% cases it's HOH but not  
always. Sometimes you have all sorts of other weird molecules floating  
around. If you siphon off all HOH molecules into a separate 'solvent'  
data structure you'll loose information for some structures.

2. The HETATM/ATOM distinction is tricky as well. Some HETATM records  
(including the solvent in some PDB files) are given distinct chain ids  
and in some cases do represent linear chain like molecules. Bound DNA  
for instance: ATOM? HETATM? Chain? Not a chain? There is no consistent  
representation of these things in (legacy) PDB files so any choice you  
make will be a compromise.

That said, if you want to have a poke through the PDB parser and make  
some changes then be my guest. It's been a while since I did any PDB  
stuff (and god-willing it will be a while until I do some more!) so  
it's an area that could probably do with a fresh pair of eyes.



More information about the BioRuby mailing list