Pattern lists and fuzz(nuc|pro|tran) and [pd]reg
Gary Williams, Tel 01223 494522
gwilliam at hgmp.mrc.ac.uk
Mon Jun 14 09:26:09 UTC 2004
Should the file of patterns allow each pattern to have its own allowed
number of mismatches?
>pat1 <mismatch=1>
ggataata[ac]{2}gg
>pat2 <mismatch=2>
gcggcatgtagc[gc]{3}att
Gary
Henrikki Almusa wrote:
>
> Hello,
>
> There might be a need for doing sequences with a list of patterns. Now at the
> moment there is only tfscan and patmatmotifs that uses list of patterns to
> search in seqeunces. The problem is that tfscan uses only fixed sequences and
> patmatmotifs uses a directory of files.
>
> I propose to modify reg and fuzz pattern searches to read a "fasta format of
> pattern" file. Example file using dreg might be
>
> >pat1
> [ac]{2}gg
> >pat2
> [gc]{3}att
>
> Pattern could span multiple lines.
>
> I would be willing to try out some changes in fuzz group as start. Should the
> code that reads the pattern file reading be coded somewhere else (eg not in
> fuzznuc.c)?
>
> Any feedback or suggestions etc would be welcome
> --
> Henrikki Almusa
--
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522 Fax: +44 1223 494512
E-mail: gwilliam at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk
More information about the emboss-dev
mailing list