Pattern lists and fuzz(nuc|pro|tran) and [pd]reg

Gary Williams, Tel 01223 494522 gwilliam at hgmp.mrc.ac.uk
Wed Jun 16 10:25:18 UTC 2004


Henrikki,

Would it be a good idea to overload the pattern qualifier of
fuzz(nuc|pro|tran) and [pd]reg so that they understand '@filename' to be
a pattern list filename, in the same way as restrict will either read a
comma-delimited list of enzyme names or a '@filename' of a list of
enzymes?

So:

fuzznuc could then be run as normal:
fuzznuc em:hsfau -patt 'acgacga[gc]' -out test.out

or with a pattern list:
fuzznuc em:hsfau -patt '@patfile' -out test.out

Gary

Henrikki Almusa wrote:
> 
> On Monday 14 June 2004 12:26, Gary Williams, Tel 01223 494522 wrote:
> > Should the file of patterns allow each pattern to have its own allowed
> > number of mismatches?
> >
> > >pat1 <mismatch=1>
> >
> > ggataata[ac]{2}gg
> >
> > >pat2 <mismatch=2>
> >
> > gcggcatgtagc[gc]{3}att
> 
> No reason why not.
> 
> Now the coding itself. Since reading that file is pretty low level stuff, it
> should probably be in "ajax/" dir? My obj c abilities are not perhaps that
> good. Anyone willing to help on the .c side?
> 
> What might be needed in .h (names can be changed). This mainly is for using
> the pattern in program. This is now currently just what I could come up with,
> so I can go completely off here :).
> 
> struct AjSPattern {
>         AjPStr name;
>         AjPStr opropat;
>         AjPStr propat;
>         AjPRegex regexpat;
>         ajint mismatch;
> } AjOPattern;
> #define AjPPattern AjOPattern*
> 
> struct AjSPatlist {
>         AjPList patlist;
>         ajint type; # 0 regex, 1 prosite
> } AjPOPatlist;
> #define AjPPatlist AjOPatlist*
> 
> AjBool ajPatlistGetNext (patlist, &pattern);
> void ajPatlistRewind (patlist);
> ajint ajPatlistGetType (patlist);
> 
> AjPStr ajPatternGetName (pattern);
> ajint ajPatternGetType (pattern);
>         whether propat is not NULL in struct should work
> ajint ajPatternGetMismatch (pattern);
> AjPStr ajPatternGetPro (pattern);
> AjPStr ajPatternGetOrigPro (pattern);
> AjPRegex ajPatternGetRegex (pattern);
> AjPStr ajPatternGetPattern (pattern);
>         should return string representation of pattern it has
> 
> Acd would need propably new file type, patlist. It should be defined in the
> programs acd file, whether the pattern is regex or prosite. This would allow
> the reading (and compiling the patterns) in acd command.
> 
> acdGetPatlist();
> 
> Comments?
> --
> Henrikki Almusa

-- 
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522			Fax: +44 1223 494512
E-mail: gwilliam at rfcgr.mrc.ac.uk	Web: http://www.rfcgr.mrc.ac.uk



More information about the emboss-dev mailing list