[Biojava-l] Codon count

Khalil El Mazouari khalil.elmazouari at gmail.com
Thu Apr 21 11:54:23 UTC 2011


Hi Andy,

I am actually counting codons via 6 ORFs translations. I am working on ±100.000 seq/run => 600.000 ORFs to check. So, performance is an issue for my job.

I am just wondering if counting Codons directly on NT seq (both strand) will be faster vs translation + AA counting.

Regards,

khalil


On 21 Apr 2011, at 13:40, Andy Yates wrote:

> Hi Khalil,
> 
> Then I think windowed sequence is the only way to go. Actually one particularly "interesting" idea has just sprung to mind. What if you translated the entire sequence in frame 1 forward & reverse? Then finding the amount of correct codons is a case of looking for amino acids which are not a stop or unknown amino acid.
> 
> Andy
> 
> On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote:
> 
>> Thanks Andy,
>> it's the second option I am looking for.
>> 
>> Regards,
>> khalil
>> 
>> 
>> 
>> On 21 Apr 2011, at 13:23, Andy Yates wrote:
>> 
>>> Hi Khalil,
>>> 
>>> I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence.
>>> 
>>> Regards,
>>> 
>>> Andy
>>> 
>>> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands.
>>>> 
>>>> Any suggestion is welcome. 
>>>> 
>>>> Regards,
>>>> 
>>>> khalil
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> 
>>> -- 
>>> Andrew Yates                   Ensembl Genomes Engineer
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>> 
>>> 
>>> 
>>> 
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 





More information about the Biojava-l mailing list