[Bioperl-l] TGA as U in selenocystine fullCDS

Ewan Birney birney at ebi.ac.uk
Sun Feb 20 05:19:16 EST 2005



> 
> I've seen an email from Ewan in 2004-July bioperl-ml that they solved
> that problem in ensembl, but I haven't found how they did it in their
> code:
> 
> http://portal.open-bio.org/pipermail/bioperl-l/2004-July/016363.html
> 

There is nothing that clever in Ensembl. Remeber that although Selenocystines
always comes from TGA, of course most TGAs code for STOP. The switch to
Selenocystines is due to the RNA element downstream, and in whole genome
scans, this is not quite enough information to really see the the presence
of the U. In fact, a far better prediction method is to look for Cystine <-> TGA
alignments, which are quite common.


The best papers on this are from Roderic guigo's group- check out:

http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-newId+(([medline-alltext:12775843*])|([medline-id:12775843*]))+-vn+3+-page+qResult+-w

http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-newId+(([medline-alltext:14710190*])|([medline-id:14710190*]))+-vn+3+-page+qResult+-w


The way Ensembl handles this is that there is an object called a Sequence Edit,
and that these are applied (if present) at both the Transcript level (allowing
for RNA editing) and at the Translation (allowing for selenocystines). This can
also take in things like programmed frameshifting.


The relevant code is in

Bio::EnsEMBL::Transcript::translate

which calls

Bio::EnsEMBL::Translation::modify_translation

which looks like:

sub modify_translation {
   my ($self, $seq) = @_;

   my @seqeds = @{$self->get_all_SeqEdits()};

   # sort in reverse order to avoid complication of adjusting downstream edits
   @seqeds = sort {$b <=> $a} @seqeds;

   # apply all edits
   my $peptide = $seq->seq();
   foreach my $se (@seqeds) {
     $se->apply_edit(\$peptide);
   }
   $seq->seq($peptide);

   return $seq;
}



The SeqEdit object just explicitly stores the edit:


sub apply_edit {
   my $self   = shift;
   my $seqref = shift;

   if(ref($seqref) ne 'SCALAR') {
     throw("Reference to scalar argument expected");
   }

   if(!defined($self->{'start'}) || !defined($self->{'end'})) {
     return $seqref;
   }

   my $len = $self->{'end'} - $self->{'start'} + 1;
   substr($$seqref, $self->{'start'} - 1, $len) = $self->{'alt_seq'};

   return $seqref;
}


This idea of SeqEdits/alternative translations is very similar
to EMBL/Genbank feature table way to handling this.



Putting this into Bioperl terms, the translate() function is
on Bio::PrimarySeq; is not the place to do this - one needs other
sequence features to determine the edits. One thing to think
about would be a magic function


    $seqfeature->translate()

which would root around in the attribute hash of the seqfeature
to find any edits and apply it. This would be the right place
to put this.







More information about the Bioperl-l mailing list