[Biopython] I've written a library for executing fuzzy searches...

c0d3g33k c0d3g33k at gmail.com
Sun Nov 17 16:24:33 UTC 2013


On 11/17/2013 04:14 AM, Tal Einat wrote:
> There are already many libraries to compute vaiours [various?] 
> distance metrics between two strings, but that is not the purpose of 
> the library I'm developing (fuzzysearch). My goal is to build a 
> library for searching in strings or other sequences (e.g. DNA), 
> allowing finding nearly matching parts instead of just full matches.
>
That's what made me think of it.  It covers your use case and seems to 
be well researched, so I thought it might be of interest as you 
implement your own library.  From the description (bold mine):
> SimMetrics provides a library of float based similarity measures 
> between String Data as well as the typical unnormalised metric output.
>
> It is intended for researchers in information integration, II, and 
> other related fields. It includes a range of similarity measures from 
> a variety of communities, including statistics, *DNA analysis*, 
> artificial intelligence, information retrieval, and databases.
>
Here's a list of the metrics that are implemented:

https://web.archive.org/web/20081224234350/http://www.dcs.shef.ac.uk/~sam/stringmetrics.html

The other nice thing from a usability perspective was that it offered 
the option of normalised output in addition to the raw output of the 
original algorithms, which made it easier to compare results when 
running a series of metrics on a given set of strings.
> On Fri, Nov 15, 2013 at 10:12 PM, c0d3g33k <c0d3g33k at gmail.com 
> <mailto:c0d3g33k at gmail.com>> wrote:
>
>     Hi Tal,
>
>     This is only tangentially related to your original post, but I
>     thought I'd point out the existence of Simmetrics, a Java-based
>     similarity metrics library (GPL v2).  I thought that at some point
>     there was a Python port, but I could be confusing that with using
>     the library myself under Jython.  Though it is implemented in
>     Java, it might provide a solid foundation for a python library/api
>     should you find it interesting.  It's fairly comprehensive, so it
>     might at least provide inspiration for extending your current
>     efforts.  It seems to be unmaintained at present, but source code
>     is available both at the original Sourceforge page and at github
>     where someone cloned the project.
>
>     http://sourceforge.net/projects/simmetrics/
>     https://github.com/Simmetrics/simmetrics
>
>
> Hi,
>
> - Tal




More information about the Biopython mailing list