[Bioperl-l] RE: :structure

Jurgen Pletinckx jurgen.pletinckx at algonomics.com
Thu Dec 11 09:32:02 EST 2003


# Im using bioperl, to parse a pdb file,
#
# however I cant seem to get information on if an
# atom is a HETATM or ATOM
#
# Do you know if it is possibble to get this?
#

Dear Jonathan,

first off - I'm not Kris - he left the company a while
ago. I rather like his Bio::Structure modules, though,
and I can help out here.

As it happens, there is no built-in method to do this
as yet. I offer for your amusement the following sample
script, achieving what you seek. It is based on the same
technique used deep inside Bio::Structure::Io::pdb->write_structure.
The assumption is that all HETATM lines in the file have
been announced in the PDB header, in HET lines. This holds
for (all?) current official PDB files, but is bound to fail
for user-generated files.

This will work for atoms where you know the residue they
belong to (the usual situation). Let us know it this doesn't
work for you, so we can look further into it.


#!/usr/bin/perl -w

use strict;
use Bio::Structure::IO;

my $f = "/PDB/ab/pdb1abz.ent";

my $entry = Bio::Structure::IO->new(-file=>$f)->next_structure;

# obtain text of HET lines
(my $hetstring = ($entry->annotation->get_Annotations("het"))[0]->as_text) =~
s/^Value: //;

# create lookup hash of HET residues
my %het_res;
$het_res{'HOH'}++; # HOH is implicit hetero-atom

while (my $l = substr($hetstring,0,63,''))
{
  $l =~ s/^\s*(\S+)\s+.*$/$1/;
  $het_res{$l}++;
}

for my $chain ($entry->get_chains)
{
  my $chid = $chain->id;

  for my $residue ($entry->get_residues($chain))
  {
    my $resid = $residue->id;
    my ($label, $pos) = split /-/, $resid;
    if ($het_res{$label})
    {
      print "HETR ",$chid,$resid,"\n";
    }
    else
    {
      print "NORM ",$chid,$resid,"\n";
    }
  }
}



--
Jurgen Pletinckx
AlgoNomics NV



More information about the Bioperl-l mailing list