[Bioperl-l] Bio::Structure bug fix for seqres method
Alex Gutteridge
alexg@ebi.ac.uk
Wed, 29 May 2002 16:42:30 +0100
Hi,
I've found a bug in the seqres method found in Bio::Structure::Entry
which fails for some pdbs (1akm and 1dob, but probably others). The
subroutine is as follows in BioPerl 1.0:
sub seqres {
my ($self, $chainid) = @_;
my $s_u = "x4 A1 x7 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1
A3 x1 A3 x1 A3 x1 A3 x1 A3";
my $seq;
if ( !defined $chainid) {
my $m = ($self->get_models($self))[0];
my $c = ($self->get_chains($m))[0];
$chainid = $c->id;
}
my $seqres = ($self->annotation->get_Annotations("seqres"))[0];
my $seqres_string = $seqres->as_text;
$self->debug("seqres : $seqres_string\n");
$seqres_string =~ s/^Value: //;
$seqres_string =~ s/\d+//g; # no numbers needed
$seqres_string =~ s/ \s //g; # single character is Chain
identifier
$seqres_string =~ s/(\w+)/\u\L$1/g; # ALA -> Ala (for SeqUtils)
$seqres_string =~ s/\s//g; # strip all spaces
$self->debug("seqres : $seqres_string\n");
# this will break for non-protein structures (about 10% for now) XXX KB
my $pseq = Bio::PrimarySeq->new(-alphabet => 'protein');
$pseq = Bio::SeqUtils->seq3in($pseq,$seqres_string);
my $id = $self->id . "_" . $chainid;
$pseq->id($id);
return $pseq;
}
The lines which need changing are in the series of substitutions done on
$seqres_string.
$seqres_string =~ s/\d+//g; # no numbers needed
$seqres_string =~ s/ \s //g; # single character is Chain
identifier
should become
$seqres_string =~ s/\d+/ /g; # no numbers needed
$seqres_string =~ s/ \S //g; # single character is Chain
identifier
This fixes the problem in 1akm and 1dob, but I've not tested it on any
others so far.
Alex Gutteridge