[EMBOSS] needle question!

pmr at ebi.ac.uk pmr at ebi.ac.uk
Thu Feb 8 23:15:32 UTC 2007


Dear Karen,

>  I am currently using needle to generate an alignment between two
> sequences which contain non-informative bases (ie, identified low
> quality bases (phred scores) and have been changed to  "N").
> Presently,  these bases are penalized as any other non-matching
> character.  Is there any way to change needle to "overlook" these
> bases when generating the best scoring alignment (or, do I need to
> write my own version of needle?)

There are two matrix files for nucleotide comparisons. The default is
EDNAFULL which counts N as an average of all possible scores (1 match
against 3 possible mismatches).

The alternative is EDNAMAT which only scores exact matches like blastn
(use -data EDNAMAT on the command line to see the difference).

But you can also copy EDNAMAT to your local directory with

embossdata EDNAFULL -fetch
mv EDNAFULL EDNAPHRED
(best to do this rename or you will accidentally be using this file by
default for other needle runs in the same directory)

edit EDNAPHRED to have the scores you want for N (perhaps +1 for a small
match to ACGTU, +2 for a match to a 2-base code RYSWKM, +3 for a match to
a 3-base code BDHV and +4 for a match to another N.

Then run with:

needle -data EDNAPHRED

If enough users think this is a meaningful scoring system we could add
such a matrix to the distribution. Let us know if it really gives you more
useful scores. My natural prejudice is to trust EDNAFULL. I guess you are
expecting to often find the base in the other sequence is the one phred
started with, which will indeed bias the scoring.

Hope this helps,

Peter





More information about the EMBOSS mailing list