[EMBOSS] look for 'Ns' in a sequence using fuzznuc?
ricepeterm at yahoo.co.uk
Fri Jun 14 19:01:02 UTC 2013
On 14/06/2013 18:09, Fernan Aguero wrote:
> I guess I came across a problem ... I'm trying to rapidly find runs of Ns
> in a nucleotide sequence, and produce the corresponding 'assembly_gap'
> annotations in GFF format. This is all derived from scaffolded contigs.
> I've tried fuzznuc first because it's easy to specify a pattern, and get a
> list of locations in GFF format. However, fuzznuc uses N to mean any base.
> Is there a way to subvert fuzznuc to use another character for this purpose?
Already subverted for the next release in July. EMBOSS 6.6 will let you
escape the N with a backslash in a pattern file (or two backslashes on
the command line) to cancel the conversion of N to any base.
> Or maybe there's another emboss program to do this?
dreg uses regular expressions and so will find the Ns (but I see a bug
in some of the reported positions if you use wildcards in the pattern
... to be fixed in the next release!)
More information about the EMBOSS