[Bioperl-l] TGA as U in selenocystine fullCDS

Fri Feb 18 12:08:53 EST 2005

Hey there,

I just noticed Ensembl added selenocysteins in the latest release, do  
you how how they modelled them internally?

cheers,

Elia

On 19 Feb 2005, at 02:21, Heikki Lehvaslaiho wrote:

> Albert,
>
> I refreshed my memory (with help from Tamara Kulikova @ EBI) how  
> selenocystein
> and other exceptions are handled in EMBL/Genbank:
>
> I am afraid it is mess - partly because the awareness of these cases  
> is quite
> recent and partly because the biology itself is messy.
>
> You really need to extract the whole CDS feature from the feature  
> table to and
> look for the following three qualifiers:
>
> 1. transl_exception
> http://www.ebi.ac.uk/embl/WebFeat/qualifiers/transl_except.html
>
>    which tells you in entry coordinates where the exception is. If the  
> amino
> acid is not one of the known ones with an abbreviation, it is named  
> "OTHER",
> and there is a note qualifier witht the correct name.
>
>
> 2. codon
> http://www.ebi.ac.uk/embl/WebFeat/qualifiers/codon.html
>
>     All these codons in this CDS is translated to the stated amino acid
>
> 3. exception
> http://www.ebi.ac.uk/embl/WebFeat/qualifiers/exception.html
>
> If RNA aediting messes up translation so badly that previous  
> qualifiers are
> not enough, you can state that replace this range with these amino  
> acids.
>
>
> (one-letter codes  used  in the translation are here:
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/ 
> feature_table.html#7.5.3)
>
>
>
>
> The bottom line is, we should not touch the current translation  
> implementation
> in Bioperl. If you want to have a go at incorporating alternative
> translations that implement some of the above or the hack I suggested
> earlier, please put them into Bio::SeqUtils.
>
> Why do not you try your hand in writing a translation function that  
> takes an
> Bio::RichSeq object from the Bio:SeqIO::[embl|genebank] parser as an  
> argument
> and extracts the CDS (by name/id/order or all of them) and checks for
> exceptions AND tries to take them into account, and outputs the  
> translation
> sequence object! At the same time it should check for the transl_table
> qualifier and use that to call up the right one.
>
> Like you said there should be code that can be reused in Ensembl.
>
>
> 	-Heikki
>
>
>
>
>
> On Friday 18 February 2005 14:02, Albert Vilella wrote:
>> On Fri, 2005-02-18 at 11:28 +0000, Heikki Lehvaslaiho wrote:
>>> Albert,
>>>
>>> The best way to deal with this would be to have genetic code that
>>> correctly translates into selenocysteine. Unfortunately I could not  
>>> find
>>> anything on the topic on Taxonomy Genetic codes home page:
>>> <http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi>.
>>> I guess I should ask around if there are plans to deal with this.
>>> Are those CDSs from EMBL or Genbank? If so, could send me a few  
>>> accession
>>> numbers to check.
>>
>> from Genbank:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=57016379
>>
>>> The translate method has already too many optional arguments, so  
>>> rather
>>> not put in any more solely for dealing with celenocysteine.
>>
>> True.
>>
>>> Could you put together (and send to me) data lines for @NAMES,  
>>> @TABLES
>>> and @STARTS in Bio::Tools::CodonTables and call it tentatively  
>>> "Standard
>>> with celenocystein" and use id 20 which has been merged with existing
>>> codes and not currently in use. That should provide a working code  
>>> for
>>> your purposes while I try to find a consensus on this.
>>
>> I have added a "Standard with selenocysteine" in 20.
>> I have also added a "Bacterial with selenocysteine" in 19.
>>
>> Now is not apparent that 20 and 19 are only for in-frame TGAs, not  
>> codon
>> stops in CDSs.
>>
>> I've seen an email from Ewan in 2004-July bioperl-ml that they solved
>> that problem in ensembl, but I haven't found how they did it in their
>> code:
>>
>> http://portal.open-bio.org/pipermail/bioperl-l/2004-July/016363.html
>>
>>     Albert.
>>
>> **************
>>
>>     @NAMES =			#id
>> 	(
>> 	 'Standard',		#1
>> 	 'Vertebrate Mitochondrial',#2
>> 	 'Yeast Mitochondrial',# 3
>> 	 'Mold, Protozoan, and CoelenterateMitochondrial and
>> Mycoplasma/Spiroplasma',#4
>> 	 'Invertebrate Mitochondrial',#5
>> 	 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6
>> 	 '', '',
>> 	 'Echinoderm Mitochondrial',#9
>> 	 'Euplotid Nuclear',#10
>> 	 '"Bacterial"',# 11
>> 	 'Alternative Yeast Nuclear',# 12
>> 	 'Ascidian Mitochondrial',# 13
>> 	 'Flatworm Mitochondrial',# 14
>> 	 'Blepharisma Nuclear',# 15
>> 	 'Chlorophycean Mitochondrial',# 16
>> 	 '', '',  '',
>>          'Bacterial with selenocystein', # 19
>>          'Standard with selenocystein', # 20
>> 	 'Trematode Mitochondrial',# 21
>> 	 'Scenedesmus obliquus Mitochondrial', #22
>> 	 'Thraustochytrium Mitochondrial' #23
>> 	 );
>>
>>     @TABLES =
>> 	qw(
>> 	   FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   '' ''
>> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   '' ''
>> 	   FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG
>> 	   FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> 	   );
>>
>>
>>     @STARTS =
>> 	qw(
>> 	   ---M---------------M---------------M----------------------------
>> 	   --------------------------------MMMM---------------M------------
>> 	   ----------------------------------MM----------------------------
>> 	   --MM---------------M------------MMMM---------------M------------
>> 	   ---M----------------------------MMMM---------------M------------
>> 	   -----------------------------------M----------------------------
>> 	   '' ''
>> 	   -----------------------------------M----------------------------
>> 	   -----------------------------------M----------------------------
>> 	   ---M---------------M------------MMMM---------------M------------
>> 	   -------------------M---------------M----------------------------
>> 	   -----------------------------------M----------------------------
>> 	   -----------------------------------M----------------------------
>> 	   -----------------------------------M----------------------------
>> 	   -----------------------------------M----------------------------
>> 	   '' ''
>> 	   ---M---------------M------------MMMM---------------M------------
>> 	   ---M---------------M---------------M----------------------------
>> 	   -----------------------------------M---------------M------------
>> 	   -----------------------------------M----------------------------
>> 	   --------------------------------M--M---------------M------------
>> 	   );
>>
>> **************
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/                      http://www.ebi.ac.uk/mutations/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
>     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
>    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
>   _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
>      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l