[Biojava-l] issue with translating codons with N

LAW Andy andy.law at roslin.ed.ac.uk
Fri Sep 20 14:51:04 UTC 2013


Looking at the multipliers, I would hazard a guess that the *intent* is to multiply the numbers 0,1,2,3 (ACGT) rather than the ASCII codes. Are you sure the code uses ASCII values?


On 20 Sep 2013, at 15:16, Nick England <nickengland at gmail.com> wrote:

> Everyone,
> 
> I've stepped through with a debugger, and this is a bad bug.
> 
> The code to translate from RNA->Protein does the following:
> - Take the ASCII Value for the 3 RNA bases, and multiple the first pos by
> 16, second by 4 and third by 1 and add them up.
> - Assume there won't be any collisions.
> 
> Here are the values which it then uses:
> 
> A:65
> G:71
> C:67
> U:85
> N:78
> ANA: 1417
> CAU: 1417
> ANG: 1423
> CGA: 1423
> 
> Notice any hash collisions?
> 
> I don't get why this wasn't done in a standard JavaHashMap which would
> ensure that any collisions were resolved. This is a pretty critical bug for
> a biology informatics package.
> 
> Nick
> 
> 
> On 20 September 2013 13:45, Nick England <nickengland at gmail.com> wrote:
> 
>> Hara,
>> 
>> Hmm this is rather odd. I get the same issue with that sequence with a
>> custom engine as well.
>> 
>> My code has:
>> Builder builder = new TranscriptionEngine.Builder();
>>    builder.initMet(false);
>>    builder.translateNCodons(true);
>>    builder.trimStop(false);
>>    TranscriptionEngine engine = builder.build();
>>    Sequence<AminoAcidCompound> seq=engine.translate(new
>> DNASequence("GTNTGTTAGTGT"));
>>    assertEquals("XC*C", seq.toString());
>>    Sequence<AminoAcidCompound> seq2=engine.translate(new
>> DNASequence("ANAANG"));
>>    System.out.println(seq2);
>> the first sequence translates as expected, but your sequence is
>> translating as HR, when it should be XX. This looks like a pretty bad bug!
>> 
>> Nick
>> 
>> 
>> On 19 September 2013 19:59, Hara Dilley <hdilley at sutrobio.com> wrote:
>> 
>>> Hi,
>>> 
>>> Is there an issue with the DNA Translation in biojava3.core?
>>> It appears that it wants to translate "N" in certain cases
>>> Executing:
>>> new
>>> DNASequence("ANAANG").getRNASequence().getProteinSequence().getSequenceAsString();
>>> will produce  aa HR.
>>> 
>>> thanks
>>> Hara
>>> 
>>> ________________________________
>>> 
>>> This email and any attachments thereto may contain private, confidential,
>>> and privileged material for the sole use of the intended recipient. Any
>>> review, copying, or distribution of this email (or any attachments thereto)
>>> by others is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender immediately and permanently delete the original
>>> and any copies of this email and any attachments thereto.
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> 
>> 
>> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

Later,

Andy
--------
Yada, yada, yada...

Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the Biojava-l mailing list