[EMBOSS] Tranalign relaxation?

Justin Havird jchavird at gmail.com
Wed May 26 18:50:05 UTC 2010


Hi,

I am trying to align nucleic acid sequences based on amino acid alignments
using the program tranalign. The program normally works fine for me, but
lately I have been using mitochondrial genes and am beginning to run into
problems.

These occur when the nucleotide sequence does not match the amino acid
translation exactly. For example, in the prawn M. japonicus, the first amino
acid (MET) in the COX1 gene is encoded by the codon "ACG" rather than the
typical "ATG". Tranalign doesn't recognize ACG as encoding MET, so it throws
up this message:

Error: Guide protein sequence M. japonicus not found in nucleic sequence M.
japonicus

These errors occur on a taxa by taxa basis and are usually because of the
first codon. However, the error also occurs when the nucleotide sequence has
an ambiguous nucleotide (e.g., Y), even if the ambiguous nucleotide position
doesn't affect the translation (e.g., both GTC and GTT = VAL). I can usually
pinpoint the error to a specific nucleotide/codon like in these examples.

These errors are relatively rare, but happen more frequently in some groups
(inverts and fishes mostly).

So, does anyone know a way to "relax" the tranalign translation rules to
circumvent this problem? Or have another program/solution?

I think the user from the message below had a similar problem, but I see no
answer. :(

Thanks!

Justin


>From Nov 12, 2006:
> Hello - I'm trying to use tranalign to align DNA sequences but it
> keeps throwing errors.  I tested it on the example input files from
> the documentation web pages and those work fine.
>
> Error: Guide protein sequence SS1G01814 not found in nucleic sequence
> SS1G01814
> <snip>
> it throws the same error for every pair of proteins in the file
>
> here are the sequence names in the files.  I can supply the full
> files if anyone thinks they can help.

Yes, please send the full files to emboss-bug at emboss.open-bio.org

The message is not an ID mismatch - it says the protein sequence did not
match the DNA sequence.

regards,

Peter Rice


**********************************************************************
Justin C. Havird
Department of Biological Sciences &
Cellular and Molecular Biosciences Program
Auburn University
101 Life Science Building
Auburn, AL 36849
Tele # (334) 844-3223
Fax # (334) 844-1645
Email: jhavird at auburn.edu
Lab Website: http://www.auburn.edu/~santosr/<http://www.auburn.edu/%7Esantosr/>
**********************************************************************



More information about the EMBOSS mailing list