[EMBOSS] EMBOSS eprimer3 and ambiguous DNA

Peter biopython at maubp.freeserve.co.uk
Tue Apr 13 12:44:28 UTC 2010


Hello again,

I just ran eprimer3 on a multiple FASTA file (using published genome
sequences), and noticed a couple of messages:

"Error: Unrecognized base in input sequence"

Additionally, for two of the sequences there were no primer pairs (just
some blank lines instead). These appear to correspond to two of the
sequences in my input which had IUPAC ambiguous characters in the
sequence (e.g. R, W, Y, N). The eprimer3 documentation does say
explicitly that for some input files such characters are converted into
N (options -mispriminglibraryfile and -mishyblibraryfile) .

What is supposed to happen in a sequence in the main input file has
such characters?

I would expect to still get back a candidate set of primers (even if they
do not cover the regions with ambiguous letters).

As an experiment I added an N character to the end of an unambiguous
sequence, and eprimer3 seemed happy. So, as a work around I've simply
replaced all the ambiguous characters (like R, W and Y) with N, and it
seems to work. Maybe eprimer3 could do this for me, or at least have
this limitation mentioned in the documentation?

Thanks,

Peter C.



More information about the EMBOSS mailing list