[Bioperl-l] Translating codons
Karger, Amir
AKarger@CuraGen.com
Sun, 24 Jun 2001 17:43:12 -0400
Am I correct in thinking that the default PrimarySeqI::translate method is
pretty slow? It calls translate on each three-letter codon. Why not have
translate take any sequence with length 3n, returning a string of length n?
Just move the for loop inside the subroutine. It seems like it would still
work if you happen to put in a single codon, but this way would work faster
for sequences of, say, thousands of bases.
For example, here's code that translates a protein.
---------------------------
use Benchmark;
my $seq = "actgactgactgactggtgcactacgacta" x 1000;
my $len = length($seq);
%amino = &get_codons;
timethese(50, {
"substr" => \&do_substr,
"match" => \&do_match,
"pack" => \&do_pack,
}, "dividing large string with subroutine" );
sub translate {
my $in = shift;
$out = $amino{$in};
return $out;
}
sub do_substr {
my $protein = "";
for ($i = 0 ; $i < $len ; $i += 3) {
my $codon = substr($seq, $i, 3);
$protein .= &translate($codon);
}
return $protein;
}
[stuff that's the same as do_substr snipped]
sub do_match {
my @triplet = ($seq =~ /(...)/g);
}
sub do_pack {
my @triplet = unpack("A3" x ($len/3), $seq);
}
------------------------
(Out of curiosity, I tried three methods of splitting the string.
Surprisingly, the difference between them seems to be only about 5%. But...)
As you can see, there's a 100% or so speedup, when I changed the code to
just do the $amino{$codon} inside the do_* subs, rather than calling
&translate).
Benchmark: timing 50 iterations of match, pack, substr without sub call.
match: 8 8.04 0.03 0 0 dividing large string
pack: 7 7.27 0 0 0
substr: 9 8.22 0 0 0
Benchmark: timing 50 iterations of match, pack, substr with n sub calls
match: 19 17.82 0.01 0 0 dividing large string with subroutine
pack: 19 16.96 0.01 0 0 dividing large string with subroutine
substr: 19 18.96 0 0 0 dividing large string with subroutine
As far as I can tell, this would be a pretty easy change and wouldn't break
anything. (Famous last words.)
Amir Karger
Curagen Corporation