[Bioperl-l] Distance between residues
Jurgen Pletinckx
jurgen.pletinckx at algonomics.com
Fri Apr 30 08:47:40 EDT 2004
Is that linear distance (along the sequence, and measured in
residues) or 3D distance (across a structure, in angstrom)?
In either case, do you just need the minimum distance, a list
of all the distances, or some other metric?
If all you need to know is whether the sequence matches 'X
less than n residues distant from Z', regular expressions
will be the quickest solution. If, on the other hand, you
need to know the actual distance, some more work will be
involved.
This is a working example for minimum linear distance:
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $in = Bio::SeqIO->new('-file' => "all_proteins.txt",
'-format' => 'fasta');
while (my $seq = $in->next_seq)
{
my $string = $seq->seq;
my @pos1;
push @pos1, pos($string) while $string =~ /R|P|K|T/g;
my @pos2;
push @pos2, pos($string) while $string =~ /H|C|D|E/g;
# you may want to do something specific when either
# set is completely absent...
next unless @pos1 and @pos2;
my $minimum = abs($pos1[0] - $pos2[0]);
for my $p1 (@pos1)
{
for my $p2 (@pos2)
{
my $d = abs($p1-$p2);
$minimum = $d if $d < $minimum;
}
}
print $string,"\n";
print $minimum, "\n";
}
Optimisation may be necessary - this takes 17 seconds (on my creaky
SGI machine) to process 800 sequences. Fortunately, there's an
obvious optimisation to be done: check first for the common cases
where residues from your sets occur next to each other, or with one
other residue inbetween.
--
Jurgen Pletinckx
AlgoNomics NV
More information about the Bioperl-l
mailing list