[EMBOSS] question about 'fuzznuc'and 'urzpro'

Peter Rice pmr at ebi.ac.uk
Tue Feb 13 10:03:21 UTC 2007


Hi Jean,

I copied this reply to the list - as it includes poorly documented features
and some suggestions for the future.

> It's great to know it can be done! I do have further questions. So in the 
> pattern file that has no name and contains two lines, you said it's going to 
> default to pattern 1. Does that means that without the '>', everything will 
> be concatenated and treated as one pattern?

Yes. We did include a -pformat qualifier to set the format of the pattern file,
so we can extend in future to have one pattern per line.

Actually I should ask what's the difference between
> 
>> pat2 <mismatch=1>
> cg(2)c(3)taac
> cctagc(3)ta
> 
> and 
> 
>> pat2 <mismatch=1>
> cg(2)c(3)taaccctagc(3)ta

They are the same - pattern lines are simply joined together until the next new
pattern header (>pat3) is found.

> also what's the difference between a file containing
>> pat2 <mismatch=1>
> cg(2)c(3)taac
> cctagc(3)ta

> with a file containing
> cg(2)c(3)taac
> cctagc(3)ta

The first allows one mismatch in matching the pattern. These patterns for with
the HHTETRA entry we use for the example in the program manual (accession number
L46634)

>HHTETRA L46634.1 Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region.
aagcttaaactgaggtcacacacgactttaattacggcaacgcaacagctgtaagctgca
ggaaagatacgatcgtaagcaaatgtagtcctacaatcaagcgaggttgtagacgttacc
tacaatgaactacacctctaagcataacctgtcgggcacagtgagacacgcagccgtaaa
ttcaaaactcaacccaaaccgaagtctaagtctcaccctaatcgtaacagtaaccctaca
actctaatcctagtccgtaaccgtaaccccaatcctagcccttagccctaaccctagccc
taaccctagctctaaccttagctctaactctgaccctaggcctaaccctaagcctaaccc
taaccgtagctctaagtttaaccctaaccctaaccctaaccatgaccctgaccctaaccc
tagggctgcggccctaaccctagccctaaccctaaccctaatcctaatcctagccctaac
cctagggctgcggccctaaccctagccctaaccctaaccctaaccctagggctgcggccc
taaccctaaccctagggctgcggcccgaaccctaaccctaaccctaaccctaaccctagg
gctgcggccctaaccctaaccctagggctgcggccctaaccctaaccctagggctgcggc
ccgaaccctaaccctaaccctaaccctagggctgcggccctaaccctaaccctagggctg
cggccctaaccctaaccctaactctagggctgcggccctaaccctaaccctaaccctaac
cctagggctgcggcccgaaccctagccctaaccctaaccctgaccctgaccctaacccta
accctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta
accctaaccctaaccctaaccctaaccccgcccccactggcagccaatgtcttgtaatgc
cttcaaggcactttttctgcgagccgcgcgcagcactcagtgaaaaacaagtttgtgcac
gagaaagacgctgccaaaccgcagctgcagcatgaaggctgagtgcacaattttggcttt
agtcccataaaggcgcggcttcccgtagagtagaaaaccgcagcgcggcgcacagagcga
aggcagcggctttcagactgtttgccaagcgcagtctgcatcttaccaatgatgatcgca
agcaagaaaaatgttctttcttagcatatgcgtggttaatcctgttgtggtcatcactaa
gttttcaagctt

> Also could you explain how to use -pname and -pmismatch?
>I don't understand this part at all :-P Thank you very much!

Ah ... they are associated qualifiers (like -sformat, sbegin, send for
sequences, -osformat for sequence output, -aformat for alignments and -rformat
for reports.

They only show up if you use -help -verbose to see the help.

This caused some problems for fuzznuc users with release 4.0.0 as they replace
the previous version which had a -mismatch option and only read one pattern.

-pmismatch sets a default number of mismatches for all patterns (that you can
override within the pattern file).

-pname sets a pattern name for the output (something that was missing before).
Oops, we have a bug ... the name is being ignored in fuzznuc. Will be fixed in
4.1.0.

-pformat sets the pattern file format - so far this is ignored so we have not
documented pattern file format names. I think a file with one line for each
pattern and numbering 1, 2, 3 added to the pattern name would be useful. We
could call the formats "simple" (one line per pattern) and "fasta" (the current
format with names)

Oops, another bug. Using a bad pattern file name is not being caught. Fixed in 4.1.0

We also added files of regular expressions used by dreg and preg so you can also
use them for pattern searched (it depends on whether you prefer prosite-style
patterns or regular expressions - I find the prosite style for fuzznuc are much
easier). We can use the same file formats for them.

I have to check the original pattern file code from Henrikki Almusa to see
whether we lost anything in the naming and formats.

Hope that helps,

Peter






More information about the EMBOSS mailing list