[Bioperl-l] Fuzzy Pattern Matching Algorithm

demerphq demerphq at gmail.com
Fri Dec 10 06:04:24 EST 2004


Recently some questions appeared on Perlmonks asking about algorithms
to use for fuzzy pattern matching of genestrings (in the form of
strings of "ACGT").  After some debate the following thread was
published with a summary of our results:


To give an example without having to read the linked document one
algorithm is capable of search 1 million chars for any fuzzy matches
of 500_000 25 char sequences in about 150 seconds and 10mb in 1000
seconds. Is this performance good?

I suspect that one of the few places this code may actually prove
useful is in the context of Bioperl. Im curious as to what solutions
and scenarios such algorithms would be used in. For those who
participated I believe most were purely interested as a Comp Sci
problem and not actually for the utility of the solution itself so we
really have no idea.

It would be very interesting to hear the thoughts of an experienced
BioPerl developer on our efforts. Especially if it might mean that
somebody would do something useful with them :-)

Apologies if this is a waste of your time and bandwidth.


First they ignore you, then they laugh at you, then they fight you,
then you win.

More information about the Bioperl-l mailing list