Pattern lists and fuzz(nuc|pro|tran) and [pd]reg

Gary Williams, Tel 01223 494522 gwilliam at hgmp.mrc.ac.uk
Mon Jun 14 09:26:09 UTC 2004


Should the file of patterns allow each pattern to have its own allowed
number of mismatches?

>pat1 <mismatch=1>
ggataata[ac]{2}gg
>pat2 <mismatch=2>
gcggcatgtagc[gc]{3}att

Gary

Henrikki Almusa wrote:
> 
> Hello,
> 
> There might be a need for doing sequences with a list of patterns. Now at the
> moment there is only tfscan and patmatmotifs that uses list of patterns to
> search in seqeunces. The problem is that tfscan uses only fixed sequences and
> patmatmotifs uses a directory of files.
> 
> I propose to modify reg and fuzz pattern searches to read a "fasta format of
> pattern" file. Example file using dreg might be
> 
> >pat1
> [ac]{2}gg
> >pat2
> [gc]{3}att
> 
> Pattern could span multiple lines.
> 
> I would be willing to try out some changes in fuzz group as start. Should the
> code that reads the pattern file reading be coded somewhere else (eg not in
> fuzznuc.c)?
> 
> Any feedback or suggestions etc would be welcome
> --
> Henrikki Almusa

-- 
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522			Fax: +44 1223 494512
E-mail: gwilliam at rfcgr.mrc.ac.uk	Web: http://www.rfcgr.mrc.ac.uk



More information about the emboss-dev mailing list