
Security News
Official Go SDK for MCP in Development, Stable Release Expected in August
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.
Find the Jaro Winkler Distance indicating a distance or similarity score between two strings.
This module finds a non-euclidean distance or similarity between two strings.
Jaro and Jaro-Winkler equations provides a score between two short strings where errors are more prone at the end of the string. Jaro's equation measure is the weighted sum of the percentage of matching and transposed characters from each string. Winkler's factor adds weight in Jaro's formula to increase the calculated measure when there is a sequence of characters (a prefix) in both strings.
This version is based on the original C implementation of strcmp95 implementation but does not attempt to normalize characters that are similar to the eyes (e.g.: O
vs 0
).
2
) using Python's round
.The complexity of this algoritme resides in finding the matching
and transposed
characters. That is because of the interpretation of what are the matching
conditions and the definition of transposed
. Definitions of those two will make the score vary between implementations of this algorithme.
Here is how matching
and transposed
are defined in this module:
N
is matching
if found at position N
or within distance
on either side in the second string.distance
is calculated using the rounded down length of the longest string divided by two minus one.transposed
if they previously matched and aren't at the same position in the matching character subset.TODO: Implementation should be refactored to use Python's Decimal module from the standard library. This module was introduced in Python 3.9.
Calculate the Jaro Winkler similarity ($sim_{w}$) between PENNSYLVANIA
and PENNCISYLVNIA
:
$$ s_{1}=\text{PENNSYLVANIA} \qquad\text{and}\qquad s_{2}=\text{PENNCISYLVNIA} $$
P E N N C I S Y L V N I A
┌-─────────────────────────
P │ 1 ╎
E │ 1 ╎
N │ 1 ╎
N │ 1 ╎ Symbols '╎' represent the sliding windows
S │ 1 ╎ boundary in the second string where we look
Y │ ╎ 1 ╎ for the first string's character.
L │ ╎ 1 ╎
V │ ╎ 1 d = 5 in this example.
A │ ╎ 1
N │ ╎ 1
I │ ╎ 1
A │ ╎
$$ \begin{split} d &= \left\lfloor {\max(12, 13) \over 2} \right\rfloor - 1 \newline &= 5 \newline \end{split} \qquad \text{ and } \qquad \begin{split} |s_{1}| &= 12 \newline |s_{2}| &= 13 \newline \end{split} \qquad \text{ and } \qquad \begin{split} \ell &= 4 \newline m &= 11 \newline t &= 3 \newline p &= 0.1 \newline \end{split} $$
Considering the input parameters calculated above:
$$ \begin{split} sim_{j} &=\begin{cases} 0 & \text{if } m = 0 \newline {1 \over 3} \times \left({m \over |s_{1}|} + {m \over |s_{2}|} + {{m - t} \over m} \right) & \text{otherwise} \end{cases} \newline &={1 \over 3} \times \left({11 \over 12} + {11 \over 13} + {{11 - 3} \over 11}\right) \newline &= 0.83003108003 \newline \end{split} \qquad \text{then} \qquad \begin{split} sim_{w} &= sim_{j} + \ell \times p \times (1 - sim_{j}) \newline &= 0.83003108003 + 4 \times 0.1 \times (1 - 0.83003108003) \newline &= 0.89801864801 \newline \end{split} $$
We found that the $\lceil sim_{w} \rceil$ is $0.9$.
from pyjarowinkler import distance
distance.get_jaro_similarity("PENNSYLVANIA", "PENNCISYLVNIA", decimals=12)
# 0.830031080031
distance.get_jaro_winkler_similarity("PENNSYLVANIA", "PENNCISYLVNIA", decimals=12)
# 0.898018648019
distance.get_jaro_distance("hello", "haloa", decimals=4)
# 0.2667
distance.get_jaro_similarity("hello", "haloa", decimals=2)
# 0.73
distance.get_jaro_winkler_distance("hello", "Haloa", scaling=0.1, ignore_case=False)
# 0.4
distance.get_jaro_winkler_distance("hello", "HaLoA", scaling=0.1, ignore_case=True)
# 0.24
distance.get_jaro_winkler_similarity("hello", "haloa", decimals=2)
# 0.76
You need to have installed asdf
on your system. Then, running the commands below will setup your environment with the project's optional (dev) requirements and create the python virtual environment necessary to run test, lint, and build steps.
Typical order of execution is as follow:
$ cd ./jaro-winkler-distance
$ asdf install
$ pip install '.[dev]'
$ hatch python install 3.13 3.12 3.11 3.10 3.9
$ hatch env create
Other helpful commands:
hatch test
hatch fmt
hatch env show
hatch run test:unit
hatch run test:all
hatch run lint:all
$ ./release.sh help
Usage: release.sh [help|major|minor|patch]
FAQs
Find the Jaro Winkler Distance indicating a distance or similarity score between two strings.
We found that pyjarowinkler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.
Security News
New research reveals that LLMs often fake understanding, passing benchmarks but failing to apply concepts or stay internally consistent.
Security News
Django has updated its security policies to reject AI-generated vulnerability reports that include fabricated or unverifiable content.