[Bioperl-l] problem parsing pdb
Paola Bisignano
paola_bisignano at yahoo.it
Tue Sep 8 08:55:21 UTC 2009
Hi,
I'm in a little troble because i need to exactly parse pdb file, to extract chain id and res id, but I finded that in some pdb the number of residue is followed by a letter because is probably a residue added by crystallographers and they didm't want to change the number of residue in sequence....for example the pdb 1PXX.pdb I parsed it with my script below, I didn't find any useful suggestion about this in bioperltutorial or documentation of bioperl online
#!/usr/local/bin/perl
use strict;
use warnings;
use Bio::Structure::IO;
use LWP::Simple;
my $urlpdb= "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1PXX";
my $content = get($urlpdb);
my $pdb_file = qq{1pxx.pdb};
open my $f, ">$pdb_file" or die $!;
binmode $f;
print $f $content;
print qq{$pdb_file\n};
close $f;
my $structio=Bio::Structure::IO->new (-file=>$pdb_file);
my $struc=$structio->next_structure;
for my $chain ($struc->get_chains)
{
my $chainid = $chain->id ;
for my $res ($struc->get_residues($chain))
{
my $resid=$res-> id;
my $atoms= $struc->get_atoms($res);
open my $f, ">> 1pxx.parsed";
print $f "$chainid\t$resid\n";
close $f;
}
}
but it gives my file with an error in ILE 105A ILE 2105C because they have a letter that follow the number of resid.... can I solve that problem without writing intermediate files?
because i need to have the reside id as 105A not 105.A
so
A ILE-105A
without point between number and letter....
Thank you all,
Paola
More information about the Bioperl-l
mailing list