[Bioperl-l] Distance between residues

Fri Apr 30 08:47:40 EDT 2004

Is that linear distance (along the sequence, and measured in
residues) or 3D distance (across a structure, in angstrom)?

In either case, do you just need the minimum distance, a list 
of all the distances, or some other metric?

If all you need to know is whether the sequence matches 'X
less than n residues distant from Z', regular expressions
will be the quickest solution. If, on the other hand, you 
need to know the actual distance, some more work will be 
involved.

This is a working example for minimum linear distance:

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;

my $in = Bio::SeqIO->new('-file' => "all_proteins.txt",
                         '-format' => 'fasta');

while (my $seq = $in->next_seq)
{
        my $string = $seq->seq;

        my @pos1;
        push @pos1, pos($string) while $string =~ /R|P|K|T/g;

        my @pos2;
        push @pos2, pos($string) while $string =~ /H|C|D|E/g;

        # you may want to do something specific when either
        # set is completely absent...
        next unless @pos1 and @pos2;

        my $minimum = abs($pos1[0] - $pos2[0]);

        for my $p1 (@pos1)
        {
                for my $p2 (@pos2)
                {
                        my $d = abs($p1-$p2);
                        $minimum = $d if $d < $minimum;
                }
        }

        print $string,"\n";
        print $minimum, "\n";
} 

Optimisation may be necessary - this takes 17 seconds (on my creaky 
SGI machine) to process 800 sequences. Fortunately, there's an 
obvious optimisation to be done: check first for the common cases 
where residues from your sets occur next to each other, or with one
other residue inbetween. 

-- 
Jurgen Pletinckx
AlgoNomics NV