[Bioperl-l] problem parsing pdb

Paola Bisignano paola_bisignano at yahoo.it
Tue Sep 8 08:55:21 UTC 2009


Hi,

I'm in a little troble because i need to exactly parse pdb file, to extract chain id and res id, but I finded that in some pdb the number of residue is followed by a letter because is probably a residue added by crystallographers and they didm't want to change the number of residue in sequence....for example the pdb 1PXX.pdb I parsed it with my script below, I didn't find any useful suggestion about this in bioperltutorial or documentation of bioperl online

#!/usr/local/bin/perl
use strict;
use warnings;
use Bio::Structure::IO;
use LWP::Simple;



 my $urlpdb= "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1PXX";
   my $content = get($urlpdb); 
   my $pdb_file = qq{1pxx.pdb};
   open my $f, ">$pdb_file" or die $!;
   binmode $f; 
   print $f $content;
   print qq{$pdb_file\n};
   close $f;



my $structio=Bio::Structure::IO->new (-file=>$pdb_file);
   my $struc=$structio->next_structure;
   for my $chain ($struc->get_chains) 
    {
    my $chainid = $chain->id ;
    for my $res ($struc->get_residues($chain))
        {
        my $resid=$res-> id;
        my $atoms= $struc->get_atoms($res);
        open my $f, ">> 1pxx.parsed";
            print  $f   "$chainid\t$resid\n";
            close $f;
        }
    }



but it gives my file with an error in ILE 105A  ILE 2105C because they have a letter that follow the number of resid.... can I solve that problem without writing intermediate files?
because i need to have the reside id as 105A not 105.A
so
 A          ILE-105A 
without point between number and letter....




Thank you all,

Paola



      



More information about the Bioperl-l mailing list