[EMBOSS] look for 'Ns' in a sequence using fuzznuc?

Peter Rice ricepeterm at yahoo.co.uk
Fri Jun 14 19:01:02 UTC 2013


Hi Fernan,

On 14/06/2013 18:09, Fernan Aguero wrote:
> I guess I came across a problem ... I'm trying to rapidly find runs of Ns
> in a nucleotide sequence, and produce the corresponding 'assembly_gap'
> annotations in GFF format. This is all derived from scaffolded contigs.
>
> I've tried fuzznuc first because it's easy to specify a pattern, and get a
> list of locations in GFF format. However, fuzznuc uses N to mean any base.
>
> Is there a way to subvert fuzznuc to use another character for this purpose?

Already subverted for the next release in July. EMBOSS 6.6 will let you 
escape the N with a backslash in a pattern file (or two backslashes on 
the command line) to cancel the conversion of N to any base.

> Or maybe there's another emboss program to do this?

dreg uses regular expressions and so will find the Ns (but I see a bug 
in some of the reported positions if you use wildcards in the pattern 
... to be fixed in the next release!)

regards,

Peter Rice
EMBOSS team



More information about the EMBOSS mailing list