Pattern lists and fuzz(nuc|pro|tran) and [pd]reg
Gary Williams, Tel 01223 494522
gwilliam at hgmp.mrc.ac.uk
Wed Jun 16 10:25:18 UTC 2004
Henrikki,
Would it be a good idea to overload the pattern qualifier of
fuzz(nuc|pro|tran) and [pd]reg so that they understand '@filename' to be
a pattern list filename, in the same way as restrict will either read a
comma-delimited list of enzyme names or a '@filename' of a list of
enzymes?
So:
fuzznuc could then be run as normal:
fuzznuc em:hsfau -patt 'acgacga[gc]' -out test.out
or with a pattern list:
fuzznuc em:hsfau -patt '@patfile' -out test.out
Gary
Henrikki Almusa wrote:
>
> On Monday 14 June 2004 12:26, Gary Williams, Tel 01223 494522 wrote:
> > Should the file of patterns allow each pattern to have its own allowed
> > number of mismatches?
> >
> > >pat1 <mismatch=1>
> >
> > ggataata[ac]{2}gg
> >
> > >pat2 <mismatch=2>
> >
> > gcggcatgtagc[gc]{3}att
>
> No reason why not.
>
> Now the coding itself. Since reading that file is pretty low level stuff, it
> should probably be in "ajax/" dir? My obj c abilities are not perhaps that
> good. Anyone willing to help on the .c side?
>
> What might be needed in .h (names can be changed). This mainly is for using
> the pattern in program. This is now currently just what I could come up with,
> so I can go completely off here :).
>
> struct AjSPattern {
> AjPStr name;
> AjPStr opropat;
> AjPStr propat;
> AjPRegex regexpat;
> ajint mismatch;
> } AjOPattern;
> #define AjPPattern AjOPattern*
>
> struct AjSPatlist {
> AjPList patlist;
> ajint type; # 0 regex, 1 prosite
> } AjPOPatlist;
> #define AjPPatlist AjOPatlist*
>
> AjBool ajPatlistGetNext (patlist, &pattern);
> void ajPatlistRewind (patlist);
> ajint ajPatlistGetType (patlist);
>
> AjPStr ajPatternGetName (pattern);
> ajint ajPatternGetType (pattern);
> whether propat is not NULL in struct should work
> ajint ajPatternGetMismatch (pattern);
> AjPStr ajPatternGetPro (pattern);
> AjPStr ajPatternGetOrigPro (pattern);
> AjPRegex ajPatternGetRegex (pattern);
> AjPStr ajPatternGetPattern (pattern);
> should return string representation of pattern it has
>
> Acd would need propably new file type, patlist. It should be defined in the
> programs acd file, whether the pattern is regex or prosite. This would allow
> the reading (and compiling the patterns) in acd command.
>
> acdGetPatlist();
>
> Comments?
> --
> Henrikki Almusa
--
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522 Fax: +44 1223 494512
E-mail: gwilliam at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk
More information about the emboss-dev
mailing list