[Bioperl-l] More on PDB and chains...

Thu Sep 14 17:31:09 UTC 2006

> Hi,
> 
> HETATM sometimes are present in a chain. So we cannot just exclude all
> HETATMS from a chain. However, since a chain is terminated with TER we
> could indeed store all non-chain HETATMs in an object (indeed like
> $struc->get_hetatm).

Sounds good to me.  You could have a new class that inherits from the chain
class or it’s interface (has same methods) but acts as a container for all
the non-chain atoms, that way it differentiates itself from chain.  This
object could be retrieved via a get_nonchain() method instead of
get_chains().  

The Bio::Structure implementation, judging by the docs, is pretty confusing
and, IMHO, needs some work.  For instance, I wouldn’t expect to get the
residues for each chain from the structure object but from the chain object,
somewhat like:

while ( my $struc = $stream->next_structure() ) {
    while (my $chain = $struc->next_chain()) {
        while (my $res = $chain->next_residue()) {
            # do work here
        }
    }
    while (my $chain = $struc->next_nonchain()) { # or whatever
        while (my $res = $chain->next_residue()) {
            # do work here
        }
    }
}    

Right now, you get the residues directly from the structure object, using
the chain as input.  I don’t know the internals but this makes me think all
the residue data is in the structure object and not the chain object.  A bit
inconvenient.

while ( my $struc = $stream->next_structure() ) {
    for my $chain ($struc->get_chains) {
        my $chainid = $chain->id;
            my @res = $struc->get_residues($chain);
            # do work here
        }
    }
}

> What would be nice is to be able to see if a "residue" IN a chain is a
> HETATM. (Sometimes) modified residues (e.g. CME) are also labelled
> HETATM. At least internally to Structure::pdb it is clear what are
> HETATMs since the PDB files are written (almost) correctly.

This sounds more like ‘get_hetatm()’, but should it be $chain->get_hetatm(),
not $struct->get_hetatm($chain)?

Going this route, you could have ‘is_hetatm($resnumber)’ (boolean for
residue position), ‘get_hetatms()’ (grabs all hetatms), ‘next_hetatm()’
(iterate through the hetatms), etc.  This could be along with
‘next_residue()’, ‘get_residues()’, etc for all residues, regardless of what
type of residue they are.

Chris

> I used a script on
> http://lists.open-bio.org/pipermail/bioperl-guts-l/2005-
> November/020116.html
> to write PDB from 8HVP. In this case indeed at each "border" between
> ATOM and HETATOM within the chain a TER is printed where the original
> record has ATOM. Look for ABA and LOV HETATMS in the chain.
> Indeed I agree that non-chain HETATMs should not be part of the default
> chain.
> So a PDB record (e.g. 102L) with only one chain should have the
> protein chain and a separate HETATM "chain".
> 
> Bernd
> 
> On 9/14/06, Brian Osborne <osborne1 at optonline.net> wrote:
> > Bernd,
> >
> > I¹m taking this discussion back into bioperl-l. You've uncovered a
> slightly
> > different bug then. Shouldn't the HETATMs always be in a separate
> "chain"
> > regardless of whether there are 1 or more than 1 polypeptide chains? So
> > that¹s one question.
> >
> > Related question: shouldn't the get_chains() method only return
> polypeptide
> > chains, just as they're described in the PDB file? I would think that
> you'd
> > retrieve the HETATMs using something like:
> >
> > my $hetatm = $struc->get_hetatm
> >
> > In the PDB file if there are, say, 3 chains the get_chains() method
> returns
> > 4. One of these is the HETATMs ³chain² labelled by the id Œdefault¹. I
> don¹t
> > think this is right since, first, the heteroatoms do no constitute a
> ³chain²
> > and, second, the PDB file itself states that there are 3 chains. Perhaps
> > users of StructIO::pdb have other points of view?
> >
> > Brian O.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign