[Bioperl-l] TGA as U in selenocystine fullCDS
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Fri Feb 18 10:21:13 EST 2005
Albert,
I refreshed my memory (with help from Tamara Kulikova @ EBI) how selenocystein
and other exceptions are handled in EMBL/Genbank:
I am afraid it is mess - partly because the awareness of these cases is quite
recent and partly because the biology itself is messy.
You really need to extract the whole CDS feature from the feature table to and
look for the following three qualifiers:
1. transl_exception
http://www.ebi.ac.uk/embl/WebFeat/qualifiers/transl_except.html
which tells you in entry coordinates where the exception is. If the amino
acid is not one of the known ones with an abbreviation, it is named "OTHER",
and there is a note qualifier witht the correct name.
2. codon
http://www.ebi.ac.uk/embl/WebFeat/qualifiers/codon.html
All these codons in this CDS is translated to the stated amino acid
3. exception
http://www.ebi.ac.uk/embl/WebFeat/qualifiers/exception.html
If RNA aediting messes up translation so badly that previous qualifiers are
not enough, you can state that replace this range with these amino acids.
(one-letter codes used in the translation are here:
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#7.5.3)
The bottom line is, we should not touch the current translation implementation
in Bioperl. If you want to have a go at incorporating alternative
translations that implement some of the above or the hack I suggested
earlier, please put them into Bio::SeqUtils.
Why do not you try your hand in writing a translation function that takes an
Bio::RichSeq object from the Bio:SeqIO::[embl|genebank] parser as an argument
and extracts the CDS (by name/id/order or all of them) and checks for
exceptions AND tries to take them into account, and outputs the translation
sequence object! At the same time it should check for the transl_table
qualifier and use that to call up the right one.
Like you said there should be code that can be reused in Ensembl.
-Heikki
On Friday 18 February 2005 14:02, Albert Vilella wrote:
> On Fri, 2005-02-18 at 11:28 +0000, Heikki Lehvaslaiho wrote:
> > Albert,
> >
> > The best way to deal with this would be to have genetic code that
> > correctly translates into selenocysteine. Unfortunately I could not find
> > anything on the topic on Taxonomy Genetic codes home page:
> > <http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi>.
> > I guess I should ask around if there are plans to deal with this.
> > Are those CDSs from EMBL or Genbank? If so, could send me a few accession
> > numbers to check.
>
> from Genbank:
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=57016379
>
> > The translate method has already too many optional arguments, so rather
> > not put in any more solely for dealing with celenocysteine.
>
> True.
>
> > Could you put together (and send to me) data lines for @NAMES, @TABLES
> > and @STARTS in Bio::Tools::CodonTables and call it tentatively "Standard
> > with celenocystein" and use id 20 which has been merged with existing
> > codes and not currently in use. That should provide a working code for
> > your purposes while I try to find a consensus on this.
>
> I have added a "Standard with selenocysteine" in 20.
> I have also added a "Bacterial with selenocysteine" in 19.
>
> Now is not apparent that 20 and 19 are only for in-frame TGAs, not codon
> stops in CDSs.
>
> I've seen an email from Ewan in 2004-July bioperl-ml that they solved
> that problem in ensembl, but I haven't found how they did it in their
> code:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2004-July/016363.html
>
> Albert.
>
> **************
>
> @NAMES = #id
> (
> 'Standard', #1
> 'Vertebrate Mitochondrial',#2
> 'Yeast Mitochondrial',# 3
> 'Mold, Protozoan, and CoelenterateMitochondrial and
> Mycoplasma/Spiroplasma',#4
> 'Invertebrate Mitochondrial',#5
> 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6
> '', '',
> 'Echinoderm Mitochondrial',#9
> 'Euplotid Nuclear',#10
> '"Bacterial"',# 11
> 'Alternative Yeast Nuclear',# 12
> 'Ascidian Mitochondrial',# 13
> 'Flatworm Mitochondrial',# 14
> 'Blepharisma Nuclear',# 15
> 'Chlorophycean Mitochondrial',# 16
> '', '', '',
> 'Bacterial with selenocystein', # 19
> 'Standard with selenocystein', # 20
> 'Trematode Mitochondrial',# 21
> 'Scenedesmus obliquus Mitochondrial', #22
> 'Thraustochytrium Mitochondrial' #23
> );
>
> @TABLES =
> qw(
> FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
> FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG
> FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> '' ''
> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
> FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG
> FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
> FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> '' ''
> FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG
> FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
> );
>
>
> @STARTS =
> qw(
> ---M---------------M---------------M----------------------------
> --------------------------------MMMM---------------M------------
> ----------------------------------MM----------------------------
> --MM---------------M------------MMMM---------------M------------
> ---M----------------------------MMMM---------------M------------
> -----------------------------------M----------------------------
> '' ''
> -----------------------------------M----------------------------
> -----------------------------------M----------------------------
> ---M---------------M------------MMMM---------------M------------
> -------------------M---------------M----------------------------
> -----------------------------------M----------------------------
> -----------------------------------M----------------------------
> -----------------------------------M----------------------------
> -----------------------------------M----------------------------
> '' ''
> ---M---------------M------------MMMM---------------M------------
> ---M---------------M---------------M----------------------------
> -----------------------------------M---------------M------------
> -----------------------------------M----------------------------
> --------------------------------M--M---------------M------------
> );
>
> **************
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambridge, CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list