[Bioperl-l] TGA as U in selenocystine fullCDS
Albert Vilella
avilella at ub.edu
Thu Feb 17 08:55:07 EST 2005
Hi,
I'm dealing with some CDS having a "U" selenocystines for which I use
the translate method:
while($seq = $input->next_seq()){
$pseq = $seq->translate(undef, undef, undef, undef, $tableid);
$aa_input->write_seq($pseq);
push (@seq_array, $pseq);
}
These being CDS (aka fullCDS in translate() notation), they shouldn't
have stop codons in the middle of the sequence, so after checking in
Genbank, I found that the TGA's are actually selenocystines.
Right now, using bioperl's translate w/fullCDS, if a stop is found in
the middle of the sequence, it will result in a warn or a throw.
Maybe we could add another option to deal with selenocysteines.
Comments?
Albert.
Bio/PrimarySeqI.pm
-------------------
# only if we are expecting to translate a complete coding region
if ($fullCDS) {
my $id = $self->display_id;
#remove the stop character
if( substr($output,-1,1) eq $stop ) {
chop $output;
} else {
$throw && $self->throw("Seq [$id]: Not using a valid terminator
codon!");
$self->warn("Seq [$id]: Not using a valid terminator codon!");
}
# test if there are terminator characters inside the protein
sequence!
if ($output =~ /\*/) {
$throw && $self->throw("Seq [$id]: Terminator codon inside
CDS!");
$self->warn("Seq [$id]: Terminator codon inside CDS!");
}
# if the initiator codon is not ATG, the amino acid needs to changed
into M
if ( substr($output,0,1) ne 'M' ) {
if ($codonTable->is_start_codon(substr($seq, 0, 3)) ) {
$output = 'M'. substr($output,1);
}
elsif ($throw) {
$self->throw("Seq [$id]: Not using a valid initiator codon!");
} else {
$self->warn("Seq [$id]: Not using a valid initiator codon!");
}
}
}
More information about the Bioperl-l
mailing list