[EMBOSS] question about 'fuzznuc'and 'urzpro'

Peter Rice pmr at ebi.ac.uk
Mon Feb 12 18:54:27 UTC 2007


Hi Jean,

> I know I can give a pattern like 'ACCGGT' and search against a file which
> contains multiple sequences. Is there a way I can specify a 'pattern file'
> which contains multiple patterns that I want to search for instead of just
> one pattern each time? For example, I have a fileA which contains multiple
> DNA sequences. I want to create a fileB which contains 20 patterns that I
> want to seach each of them against the sequences in the fileA. We are in the
> transition from GCG to EMBOSS. And the program 'findpatterns' in GCG can do
> this. But I couldn't find corresponding emboss program that does the same
> thing.

New in EMBOSS 4.0.0, contributed by Henrikki Almusa of Medicel in Helsinki.

fuzznuc (and fuzzpro and fuzztran) now can read in a file of patterns with the 
commandline syntax:

fuzznuc @patternfile

You can also use @patternfile in response to the prompt for a pattern.

Here is an example pattern file with FASTA-style IDs and mismatch counts for 
each pattern:

>pat1
cggccctaaccctagcccta
>pat2 <mismatch=1>
cg(2)c(3)taac
cctagc(3)ta
>pat3
cggc{2,4}taac{2,5}

Here is a file with just the second pattern, and no name (it will default to 
pattern1

cg(2)c(3)taac
cctagc(3)ta

You can set a default name with -pname and a default mismatch with -pmismatch

I note we could document this better in the fuzz* program manual entries. We 
will do for the 4.1 release.

Hope that helps,

Peter



More information about the EMBOSS mailing list