[Bioperl-l] Re: translation using Bioperl
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Wed, 26 Jul 2000 10:18:44 +0100
Heikki Lehvaslaiho wrote:
>
> Dear Jonathan,
>
> Thanks for persisting. I learned something new. (I had to check it
> with a
> couple of experts before I believed!)
>
> To summarize: The amino acid that starts a polypeptide in translation
> process is always methionin, even if an alternative initiator codon is
> used. That means that even when using the standard codon table, the
> correct translation of the sequence 'AGT TGA ...' is 'MV...'.
>
> I've fixed this in bioperl live and 06 CVS branches.
>
> If you want to catch those cases where translation is NOT started by a
> valid initiator codon, you can now set ->verbose to true in your
> PrimarySeq object and it will warn you.
>
> -Heikki
>
Looks like I opened a real can of worms. Thanks for input everyone.
I'll try summarize the opinions and comments so far.
(Jonathan, hold on. This is a bit more complicated issue than I
thought.)
Ewan :
> Can we watch the ->verbose attribute. This will have to be in on
> Bio::PrimarySeqI --- and i would prefer we did not burden other
> implementations with having to set this.
I will remove the ->verbose call.
This is a separate discussion but I'd like to use a global verbose to
determine how much warnings classes produce. The code is there in
Bio::Root modules - disperced in various modules which is not good.
Will Fischer:
> Errrr... How's that again? I'm guessing you mean 'AGT GTA', but even
> so, _only_ valid start codons encode the initial methionine. A start
> codon _is_ a start codon because there is a special tRNA (loaded with
> a modified methionine) that matches it.
Sorry, I copied a wrong string. In the test file (t/PrimarySeq.t), I
have 'TTGGTGGCGTCAAC' which translates to 'MVAST'.
Really, although no one knows quite what happens, if you have an
alternative start codon that is recognized by ribosomes, a methionin
is put in. There must be some additional signals in the mRNA cause a
codon mismatch, but this is plain guessing.
> Correct behavior (IMHO) would be to check whether the first codon
> matches a valid start (in the genetic code being used): if yes,
> put in Met; otherwise, put in the default amino-acid and
> (perhaps) complain.
That is what I did, except complained.
Andrew Dalke:
>
> Short answer, I agree except that it's impossible for bioperl, as
> a library, to do this. It is the responsibility of users of library
> to decide what to do when the first codon isn't a start codon.
> The difficulty is knowing the "perhaps" part. Detail below.
> ...
> For relevance to this topic, I agree that it's the relevant behaviour,
> but not of the bioperl or biopython tools. If you consider those codes
> as a library, it's the responsibility of the person using the library to
> make the call on what to do. The library should merrily translate away
> and the *calling* code detect "hmm, this doesn't start off with an M, I
> think I'll complain."
I think I have to disagree here. In my opinion, these libraries have a
two equally important roles:
1. To give default 'computational' behaviour
(e.g blindly translate any nucleotide sequence).
2. Have enough biological sense to give results identical to
nucleotide
sequence repositories (EMBL, GenBank, DDBJ).
(e.g. translate a valid CDS correctly)
(3. Third role is to do do everything better than sequence
databases...)
To achieve the level 2 for translations, this is what I suggest to
implement:
Add one more optional, boolean argument, $fullCDS, to method
translate.
If it is true:
1. Check and replace the initial amino acid.
2. Remove the trailing stop character
Note that this is the default behaviour now. In my opinion ,
the trailing stops should be left there by default.
3. Warn if a) first codon is not a valid initiator
b) last codon is not a stop
Comments anyone?
-Heikki