[Biojava-l] amino acid to nucleic acid alignment

Alex Golubev alexg at compugen.co.il
Wed Jan 4 13:06:41 EST 2006


Hi,

I'm trying to align amino acids to nucleic acids. I'm using gapped sequences both for the protein and for the DNA. I have several problems and I would very appreciate if someone could help.
1. How can I parse DNA nucleic acids and get codons. I would like to start with DNA that look like this "ATGTAT" and get a protein that look like this "MY". I'm using  "Alphabet alpha = DNATools.getCodonAlphabet();" but I can't find tokenization to parse the DNA string (does this make any sense?).
2. My other problem is that there are frame shifts and my gapped DNA look actually like this "AT-G-TAT". Is there any way to get/translate locations from the codon symbols list to/from the DNA symbols list?

I would appreciate any clue whether all of this make any sense.

Thanks,
Alex Golubev.



More information about the Biojava-l mailing list