[Bioperl-l] Quickest Codon Based MSA?

Fri Jan 25 02:17:02 UTC 2008

I don't know if it is faster or slower than what you have tried but  
the aa_to_dna_aln translates a protein alignment back to CDS.  You  
can see example code of it in use in the pairwise_kaks script in  
scripts/utilities/pairwise_kaks.PLS

-jason
On Jan 24, 2008, at 2:33 PM, Johan Nilsson wrote:

> Hello,
>
> I have a question which might not necessarily be related to  
> Bioperl, although I do believe the expertise is available here. I  
> have a couple of thousand FASTA files, each containing 20 CDS  
> sequence orthologues of rather high sequence similarity. I would  
> like to create a codon-based multiple sequence alignment for each  
> of these FASTA files (i.e. a nucleotide sequence alignment inferred  
> from alignment of the translated peptide sequences, to assure that  
> no frame shifts will occur). I first tried running Dialign2, which  
> can perform the translation/back-translation in one go, but this  
> turned out to be far too slow. I next tried to build protein  
> alignments using ClustalW and subsequently built the coding region  
> alignment using EMBOSS 'tranalign', but this also was too slow.
>
> Is there any method available which significantly speeds up the  
> codon-preserving alignment??? As I mentioned, the sequences to be  
> aligned are in general very conserved, so any heuristic taking  
> advantage of the low divergence would be very helpful! Also, is  
> there any adjustable parameter in dialign2/dialign-T that might  
> speed up the program when looking at highly similar sequences?
>
> Best regards
> /Johan Nilsson
> _______________________________________________
> Bioperl-l mailing list