[Bioperl-l] TGA as U in selenocystine fullCDS

Fri Feb 18 10:21:13 EST 2005

Albert,

I refreshed my memory (with help from Tamara Kulikova @ EBI) how selenocystein 
and other exceptions are handled in EMBL/Genbank:

I am afraid it is mess - partly because the awareness of these cases is quite 
recent and partly because the biology itself is messy.

You really need to extract the whole CDS feature from the feature table to and 
look for the following three qualifiers: 

1. transl_exception 
http://www.ebi.ac.uk/embl/WebFeat/qualifiers/transl_except.html

   which tells you in entry coordinates where the exception is. If the amino 
acid is not one of the known ones with an abbreviation, it is named "OTHER", 
and there is a note qualifier witht the correct name.

2. codon
http://www.ebi.ac.uk/embl/WebFeat/qualifiers/codon.html

    All these codons in this CDS is translated to the stated amino acid

3. exception 
http://www.ebi.ac.uk/embl/WebFeat/qualifiers/exception.html

If RNA aediting messes up translation so badly that previous qualifiers are 
not enough, you can state that replace this range with these amino acids.

(one-letter codes  used  in the translation are here:
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#7.5.3)

The bottom line is, we should not touch the current translation implementation 
in Bioperl. If you want to have a go at incorporating alternative 
translations that implement some of the above or the hack I suggested 
earlier, please put them into Bio::SeqUtils.

Why do not you try your hand in writing a translation function that takes an 
Bio::RichSeq object from the Bio:SeqIO::[embl|genebank] parser as an argument 
and extracts the CDS (by name/id/order or all of them) and checks for 
exceptions AND tries to take them into account, and outputs the translation 
sequence object! At the same time it should check for the transl_table 
qualifier and use that to call up the right one.

Like you said there should be code that can be reused in Ensembl.

	-Heikki

On Friday 18 February 2005 14:02, Albert Vilella wrote:
> On Fri, 2005-02-18 at 11:28 +0000, Heikki Lehvaslaiho wrote:
> > Albert,
> >
> > The best way to deal with this would be to have genetic code that
> > correctly translates into selenocysteine. Unfortunately I could not find
> > anything on the topic on Taxonomy Genetic codes home page:
> > <http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi>.
> > I guess I should ask around if there are plans to deal with this.
> > Are those CDSs from EMBL or Genbank? If so, could send me a few accession
> > numbers to check.
>
> from Genbank:
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=57016379
>
> > The translate method has already too many optional arguments, so rather
> > not put in any more solely for dealing with celenocysteine.
>
> True.
>
> > Could you put together (and send to me) data lines for @NAMES, @TABLES
> > and @STARTS in Bio::Tools::CodonTables and call it tentatively "Standard
> > with celenocystein" and use id 20 which has been merged with existing
> > codes and not currently in use. That should provide a working code for
> > your purposes while I try to find a consensus on this.
>
> I have added a "Standard with selenocysteine" in 20.
> I have also added a "Bacterial with selenocysteine" in 19.
>
> Now is not apparent that 20 and 19 are only for in-frame TGAs, not codon
> stops in CDSs.
>
> I've seen an email from Ewan in 2004-July bioperl-ml that they solved
> that problem in ensembl, but I haven't found how they did it in their
> code:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2004-July/016363.html
>
>     Albert.
>
> **************
>
>     @NAMES =			#id
> 	(
> 	 'Standard',		#1
> 	 'Vertebrate Mitochondrial',#2
> 	 'Yeast Mitochondrial',# 3
> 	 'Mold, Protozoan, and CoelenterateMitochondrial and
> Mycoplasma/Spiroplasma',#4
> 	 'Invertebrate Mitochondrial',#5
> 	 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6
> 	 '', '',
> 	 'Echinoderm Mitochondrial',#9
> 	 'Euplotid Nuclear',#10
> 	 '"Bacterial"',# 11
> 	 'Alternative Yeast Nuclear',# 12
> 	 'Ascidian Mitochondrial',# 13
> 	 'Flatworm Mitochondrial',# 14
> 	 'Blepharisma Nuclear',# 15
> 	 'Chlorophycean Mitochondrial',# 16
> 	 '', '',  '',
>          'Bacterial with selenocystein', # 19
>          'Standard with selenocystein', # 20
> 	 'Trematode Mitochondrial',# 21
> 	 'Scenedesmus obliquus Mitochondrial', #22
> 	 'Thraustochytrium Mitochondrial' #23
> 	 );
>
>     @TABLES =
> 	qw(
> 	   FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG
> 	   FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   '' ''
> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG
> 	   FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   '' ''
> 	   FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG
> 	   FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> 	   );
>
>
>     @STARTS =
> 	qw(
> 	   ---M---------------M---------------M----------------------------
> 	   --------------------------------MMMM---------------M------------
> 	   ----------------------------------MM----------------------------
> 	   --MM---------------M------------MMMM---------------M------------
> 	   ---M----------------------------MMMM---------------M------------
> 	   -----------------------------------M----------------------------
> 	   '' ''
> 	   -----------------------------------M----------------------------
> 	   -----------------------------------M----------------------------
> 	   ---M---------------M------------MMMM---------------M------------
> 	   -------------------M---------------M----------------------------
> 	   -----------------------------------M----------------------------
> 	   -----------------------------------M----------------------------
> 	   -----------------------------------M----------------------------
> 	   -----------------------------------M----------------------------
> 	   '' ''
> 	   ---M---------------M------------MMMM---------------M------------
> 	   ---M---------------M---------------M----------------------------
> 	   -----------------------------------M---------------M------------
> 	   -----------------------------------M----------------------------
> 	   --------------------------------M--M---------------M------------
> 	   );
>
> **************

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________