
Research
/Security News
10 npm Typosquatted Packages Deploy Multi-Stage Credential Harvester
Socket researchers found 10 typosquatted npm packages that auto-run on install, show fake CAPTCHAs, fingerprint by IP, and deploy a credential stealer.
pyjarowinkler
Advanced tools
Finds the Jaro Winkler Distance indicating a distance or similarity score between two strings.
Finds a non-euclidean distance or similarity between two strings.
Jaro and Jaro-Winkler equations provides a score between two short strings where errors are more prone at the end of the string. Jaro's equation measure is the weighted sum of the percentage of matching and transposed characters from each string. Winkler's factor adds weight in Jaro's formula to increase the calculated measure when there is a sequence of characters (a prefix) in both strings.
This version is based on the original C implementation of strcmp95 implementation but does not attempt to normalize characters that are similar to the eyes (e.g.: O vs 0).
2) using Python's round.The complexity of this algoritme resides in finding the matching and transposed characters. That is because of the interpretation of what are the matching conditions and the definition of transposed. Definitions of those two will make the score vary between implementations of this algorithme.
Here is how matching and transposed are defined in this module:
N is matching if found at position N or within distance on either side in the second string.distance is calculated using the rounded down length of the longest string divided by two minus one.transposed if they previously matched and aren't at the same position in the matching character subset.TODO: Implementation should be refactored to use Python's Decimal module from the standard library. This module was introduced in Python 3.9.
Calculate the Jaro Winkler similarity ($sim_{w}$) between PENNSYLVANIA and PENNCISYLVNIA:
$$ s_{1}=\text{PENNSYLVANIA} \qquad\text{and}\qquad s_{2}=\text{PENNCISYLVNIA} $$
P E N N C I S Y L V N I A
β-βββββββββββββββββββββββββ
P β 1 β
E β 1 β
N β 1 β
N β 1 β Symbols 'β' represent the sliding windows
S β 1 β boundary in the second string where we look
Y β β 1 β for the first string's character.
L β β 1 β
V β β 1 d = 5 in this example.
A β β 1
N β β 1
I β β 1
A β β
$$ \begin{split} d &= \left\lfloor {\max(12, 13) \over 2} \right\rfloor - 1 \newline &= 5 \newline \end{split} \qquad \text{ and } \qquad \begin{split} |s_{1}| &= 12 \newline |s_{2}| &= 13 \newline \end{split} \qquad \text{ and } \qquad \begin{split} \ell &= 4 \newline m &= 11 \newline t &= 3 \newline p &= 0.1 \newline \end{split} $$
Considering the input parameters calculated above:
$$ \begin{split} sim_{j} &=\begin{cases} 0 & \text{if } m = 0 \newline {1 \over 3} \times \left({m \over |s_{1}|} + {m \over |s_{2}|} + {{m - t} \over m} \right) & \text{otherwise} \end{cases} \newline &={1 \over 3} \times \left({11 \over 12} + {11 \over 13} + {{11 - 3} \over 11}\right) \newline &= 0.83003108003 \newline \end{split} \qquad \text{then} \qquad \begin{split} sim_{w} &= sim_{j} + \ell \times p \times (1 - sim_{j}) \newline &= 0.83003108003 + 4 \times 0.1 \times (1 - 0.83003108003) \newline &= 0.89801864801 \newline \end{split} $$
We found that the $\lceil sim_{w} \rceil$ is $0.9$.
from pyjarowinkler import distance
distance.get_jaro_similarity("PENNSYLVANIA", "PENNCISYLVNIA", decimals=12)
# 0.830031080031
distance.get_jaro_winkler_similarity("PENNSYLVANIA", "PENNCISYLVNIA", decimals=12)
# 0.898018648019
distance.get_jaro_distance("hello", "haloa", decimals=4)
# 0.2667
distance.get_jaro_similarity("hello", "haloa", decimals=2)
# 0.73
distance.get_jaro_winkler_distance("hello", "Haloa", scaling=0.1, ignore_case=False)
# 0.4
distance.get_jaro_winkler_distance("hello", "HaLoA", scaling=0.1, ignore_case=True)
# 0.24
distance.get_jaro_winkler_similarity("hello", "haloa", decimals=2)
# 0.76
You need to have installed mise on your system. Then, running the commands below will install python, uv, and github-cli.
Typical order of execution is as follow:
$ cd ./jaro-winkler-distance
$ mise install
$ uv venv
$ source .venv/bin/activate
$ uv pip install '.[dev]'
Other helpful commands:
uvx --python=3.12 python -m unittest discover -s tests/uvx ruff check --diffuvx ruff format --diffuvx mypyuvx coverage run -m unittest discover -s tests/uvx coverage report$ ./release.sh help
Usage: release.sh [help|major|minor|patch]
$ PYPI_REPO=main ./release.sh minor
FAQs
Finds the Jaro Winkler Distance indicating a distance or similarity score between two strings.
We found that pyjarowinkler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
Socket researchers found 10 typosquatted npm packages that auto-run on install, show fake CAPTCHAs, fingerprint by IP, and deploy a credential stealer.

Product
Socket Firewall Enterprise is now available with flexible deployment, configurable policies, and expanded language support.

Security News
Open source dashboard CNAPulse tracks CVE Numbering Authoritiesβ publishing activity, highlighting trends and transparency across the CVE ecosystem.