[Bioperl-l] Translating codons
Karger, Amir
AKarger@CuraGen.com
Mon, 25 Jun 2001 11:47:07 -0400
> Thanks for input. I'll see if we can improve the translate using your
> suggestions. I have my reservations though. Translations are quite
> complex beasts.
Thanks for taking my input seriously. I'm very new to bioinformatics, so I'm
sure some of my thoughts have been thought of and discarded a long time ago.
[snip difficulties in CodonTable->translate]
>
> I would help a lot if you could have a look at
> Bio::Tools::CodonTable::translate and see if all that could be put
> into your %amino hash. POD cods and comments explain what is required.
> There are quite extensive tests on required functionality in
> t/CodonTable.t and t/Seq.t.
>
> If I am missing the point here, please tell me.
Well, I definitely didn't look closely enough at CodonTable->translate and
realize how complicated things are. However, my main point actually doesn't
depend on how complicated this stuff is. I think.
All I was trying to say is that you get overhead just from calling a
subroutine. So why not move the for() loop from PrimarySeqI->translate into
CodonTable->translate? How about this:
sub translate_long {
my ($self, $seq) = @_;
my $id = $self->id;
my $l = length $seq;
throw "Need a sequence of length 3n!" if $l % 3;
$seq = lc $seq;
$seq =~ tr/u/t/;
$protein = "";
if ($seq =~ /[^actgu]/i) {
# No ambiguous codons!
for ($i = 0; $i < length($seq); $i+=3) {
$triplet = substr($seq, $i, 3);
if (exists $codons->{$triplet}) {
$protein .= substr($tables[$id-1], $codons->{$triplet}, 1);
} else {
$protein .= 'X';
}
}
} else {
for ($i = 0; $i < length($seq); $i+=3) {
$triplet = substr($seq, $i, 3);
$protein .= exists $codons->{$triplet} ? $codons->{$triplet} :
'X'
my $aa;
my @codons = _unambiquos_codons($triplet);
# ...
# More code from CodonTable->translate, only set $aa instead of
# returning things, and then
$protein .= $aa;
}
}
return $protein;
}
The calling subroutine could worry about what if the sequence isn't of
length 3n, etc. It seems to me like this could be faster than calling
translate_strict/translate many times. Of course, maybe I should shut up
until I get some more bioinformatics experience, but isn't it true that you
very often want to translate relatively long sequences?
If you implemented the above, you would have to change PrimarySeq->translate
a bit more (And I figured why not do all the substitutions in one s///?):
552,561c552,564
< # Get a sequence of length 3n
< my $l = length $seq;
< my $m = $l - ($l % 3);
< my $subseq = substr($seq, 0, $m);
< # Translate it
< my $output = $codonTable->translate($subseq);
< # Use user-input stop/unknown
< $output =~ s/\*/$stop/g;
< $output =~ s/X/$unknown/g
<
---
> for ($i = 0 ; $i < $length ; $i += 3) {
> my $codon = substr($seq, $i, 3);
> my $aa = $codonTable->translate($codon);
> if ($aa eq '*') {
> $output .= $stop;
> }
> elsif ($aa eq 'X') {
> $output .= $unknown;
> }
> else {
> $output .= $aa ;
> }
> }
I haven't written explicitly what to do if the sequence isn't length 3n,
since I don't know what the right molecular bio thing to do is, but I assume
there's something.
Anyway, would this be at all useful?
Amir Karger
CuraGen Corporation