[Bioperl-l] problem parsing pdb

Mark A. Jensen maj at fortinbras.us
Fri Sep 18 15:55:47 UTC 2009


Hi Paola--
My researches reveal that this is a "standard kludge" in pdb format. A letter 
following a residue number is called an "insertion code" or "icode", and my 
understanding is that is does allow for the insertion of residues without 
upsetting the rest of the coordinates. (This is a feature, and not laziness, 
since people very quickly begin to refer to amino acid coordinates based on a 
reference sequence in interesting region, and you can't easily say to the 
community,  "hey, that's 22 now, not 20...")

Since it's standard, you should expect it. Bio::Structure handles the icode by 
creating the residue id as follows:

   #my $res_name_num = $resname."-".$resseq;
   my $res_name_num = $resname."-".$resseq;
   $res_name_num .= '.'.$icode if $icode;

so you can get back the reside 3-letter name, its numerical position, and 
insertion code by doing

 my ($name, $number, $icode) = $res->id =~ /(.*?)-([0-9]+)\.?([A-Z]?)/;

In this case, if the icode is not present, then $icode eq '' (not undef).
Hope this helps-
Mark

----- Original Message ----- 
From: "Paola Bisignano" <paola_bisignano at yahoo.it>
To: <bioperl-l at bioperl.org>
Sent: Tuesday, September 08, 2009 4:55 AM
Subject: [Bioperl-l] problem parsing pdb


Hi,

I'm in a little troble because i need to exactly parse pdb file, to extract 
chain id and res id, but I finded that in some pdb the number of residue is 
followed by a letter because is probably a residue added by crystallographers 
and they didm't want to change the number of residue in sequence....for example 
the pdb 1PXX.pdb I parsed it with my script below, I didn't find any useful 
suggestion about this in bioperltutorial or documentation of bioperl online

#!/usr/local/bin/perl
use strict;
use warnings;
use Bio::Structure::IO;
use LWP::Simple;



my $urlpdb= 
"http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1PXX";
my $content = get($urlpdb);
my $pdb_file = qq{1pxx.pdb};
open my $f, ">$pdb_file" or die $!;
binmode $f;
print $f $content;
print qq{$pdb_file\n};
close $f;



my $structio=Bio::Structure::IO->new (-file=>$pdb_file);
my $struc=$structio->next_structure;
for my $chain ($struc->get_chains)
{
my $chainid = $chain->id ;
for my $res ($struc->get_residues($chain))
{
my $resid=$res-> id;
my $atoms= $struc->get_atoms($res);
open my $f, ">> 1pxx.parsed";
print $f "$chainid\t$resid\n";
close $f;
}
}



but it gives my file with an error in ILE 105A ILE 2105C because they have a 
letter that follow the number of resid.... can I solve that problem without 
writing intermediate files?
because i need to have the reside id as 105A not 105.A
so
A ILE-105A
without point between number and letter....




Thank you all,

Paola




_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list