[Biojava-l] Codon count

Andy Yates ayates at ebi.ac.uk
Thu Apr 21 12:06:35 UTC 2011


There will be a performance hit but you'll be rewriting the translation code so maybe the speed reduction isn't worth the recoding task. Give it a benchmark before recoding. I can't remember the exact speed but it isn't too slow

Andy

Khalil El Mazouari <khalil.elmazouari at gmail.com> wrote:

>Hi Andy,
>
>I am actually counting codons via 6 ORFs translations. I am working on
>±100.000 seq/run => 600.000 ORFs to check. So, performance is an issue
>for my job.
>
>I am just wondering if counting Codons directly on NT seq (both strand)
>will be faster vs translation + AA counting.
>
>Regards,
>
>khalil
>
>
>On 21 Apr 2011, at 13:40, Andy Yates wrote:
>
>> Hi Khalil,
>> 
>> Then I think windowed sequence is the only way to go. Actually one
>particularly "interesting" idea has just sprung to mind. What if you
>translated the entire sequence in frame 1 forward & reverse? Then
>finding the amount of correct codons is a case of looking for amino
>acids which are not a stop or unknown amino acid.
>> 
>> Andy
>> 
>> On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote:
>> 
>>> Thanks Andy,
>>> it's the second option I am looking for.
>>> 
>>> Regards,
>>> khalil
>>> 
>>> 
>>> 
>>> On 21 Apr 2011, at 13:23, Andy Yates wrote:
>>> 
>>>> Hi Khalil,
>>>> 
>>>> I'm not 100% sure what you want here. If you just want to know the
>potential number of codons on both strands of DNA then it would be
>(length / 3)*2. If what you are actually asking for is how many codons
>code for an amino acid then you would have to perform work similar to
>the transcription engine in BJ3. All codon tables are available from
>the IUPACParser class & then it would be up to you to use a
>WindowedSequence over the top of your NT sequence to get the windows or
>SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the
>WindowedSequence.
>>>> 
>>>> Regards,
>>>> 
>>>> Andy
>>>> 
>>>> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am looking for a simple method or class to count the number of a
>specific AA codon on NT seq. Counting on both strands.
>>>>> 
>>>>> Any suggestion is welcome. 
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> khalil
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>> 
>>>> -- 
>>>> Andrew Yates                   Ensembl Genomes Engineer
>>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 




More information about the Biojava-l mailing list