[Biojava-l] issue with translating codons with N

Andy Yates ayates at ebi.ac.uk
Fri Sep 20 14:36:19 UTC 2013


Hi Nick,

I feel I should come in on this issue as I was the original author. The code was copied from another location [1] and seemed to work at the time (and a *lot* faster than using a map). I do apologise for the bug but it's a simple fix. I populate a case insensitive Codon map during the class' construction. The code you're referring to does:

  int arrayIndex = triplet.intValue();
            //So long as we're within range then access
            if(arrayIndex > -1 && arrayIndex < codonArray.length) {
                target = codonArray[arrayIndex];
                if (target != null) {
                    aminoAcid = target.getAminoAcid();
                }
            }
            //Otherwise we have to use the Map
            else {
                target = quickLookup.get(triplet);
                aminoAcid = target.getAminoAcid();
            }
            if(aminoAcid == null && translateNCodons()) {
                aminoAcid = unknownAminoAcidCompound;
            }

Reduce this to:

target = quickLookup.get(triplet);
aminoAcid = target.getAminoAcid();
if(aminoAcid == null && translateNCodons()) {
  aminoAcid = unknownAminoAcidCompound;
}

That way we never use the array for lookups. Using the array based system is faster but if it isn't safe then it should be removed.

Can someone with write access do the following:

* Add this hash collision error in as a test case in https://github.com/sbliven/biojava/blob/master/biojava3-core/src/test/java/org/biojava3/core/sequence/TranslationTest.java
* Confirm the bug
* Remove all mention of the codonArray
* Confirm the bug's removal & commit/push

Thanks & once again sorry

Andy

[1] the exact place does escape me but something is telling me it was an emboss package

On 20 Sep 2013, at 13:45, Nick England <nickengland at gmail.com> wrote:

> Hara,
> 
> Hmm this is rather odd. I get the same issue with that sequence with a
> custom engine as well.
> 
> My code has:
> Builder builder = new TranscriptionEngine.Builder();
>    builder.initMet(false);
>    builder.translateNCodons(true);
>    builder.trimStop(false);
>    TranscriptionEngine engine = builder.build();
>    Sequence<AminoAcidCompound> seq=engine.translate(new
> DNASequence("GTNTGTTAGTGT"));
>    assertEquals("XC*C", seq.toString());
>    Sequence<AminoAcidCompound> seq2=engine.translate(new
> DNASequence("ANAANG"));
>    System.out.println(seq2);
> the first sequence translates as expected, but your sequence is translating
> as HR, when it should be XX. This looks like a pretty bad bug!
> 
> Nick
> 
> 
> On 19 September 2013 19:59, Hara Dilley <hdilley at sutrobio.com> wrote:
> 
>> Hi,
>> 
>> Is there an issue with the DNA Translation in biojava3.core?
>> It appears that it wants to translate "N" in certain cases
>> Executing:
>> new
>> DNASequence("ANAANG").getRNASequence().getProteinSequence().getSequenceAsString();
>> will produce  aa HR.
>> 
>> thanks
>> Hara
>> 
>> ________________________________
>> 
>> This email and any attachments thereto may contain private, confidential,
>> and privileged material for the sole use of the intended recipient. Any
>> review, copying, or distribution of this email (or any attachments thereto)
>> by others is strictly prohibited. If you are not the intended recipient,
>> please contact the sender immediately and permanently delete the original
>> and any copies of this email and any attachments thereto.
>> 
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list