
Security News
OWASP 2025 Top 10 Adds Software Supply Chain Failures, Ranked Top Community Concern
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.
Find the closest pairs in an array.
Closely uses principal component analysis (PCA) to reduce the dimensions to 2 and a divide and conquer algorithm to find the closest pair of points.
pip install closely
or install from source:
git clone https://github.com/justinshenk/closely
cd closely
pip install .
import closely
# X is an n x m numpy array
pairs, distances = closely.solve(X, n=1)
You can specify how many pairs you want to identify with n.
import closely
import numpy as np
import matplotlib.pyplot as plt
# Create dataset
X = np.random.random((100,2))
pairs, distance = closely.solve(X, n=1)
# Plot points
z, y = np.split(X, 2, axis=1)
fig, ax = plt.subplots()
ax.scatter(z, y)
for i, txt in enumerate(X):
if i in pairs:
ax.annotate(i, (z[i], y[i]), color='red')
else:
ax.annotate(i, (z[i], y[i]))
plt.show()
Check pairs:
In [10]: pairs
Out[10]:
array([[ 7, 16],
[96, 50]])
Output:

If your data has more than 3 features, closely will reduce the dimensionality by projecting it onto two directions that explain most of the variance. This speeds up processing, but is not 100% precise. In other words, if your data has four columns (eg, x, y, z, a), it will apply divide-and-conquer on the new projection bases P1 and P2.
It also removes the first point in a pair if n>1. In rare cases this leads to false negatives if the data is highly overlapping.
Python code modified from Andriy Lazorenko, packaged and made useful for >2 features by Justin Shenk.
FAQs
Closely find closest pairs of points, eg duplicates, in a dataset
We found that closely demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.

Research
/Security News
Socket researchers discovered nine malicious NuGet packages that use time-delayed payloads to crash applications and corrupt industrial control systems.

Security News
Socket CTO Ahmad Nassri discusses why supply chain attacks now target developer machines and what AI means for the future of enterprise security.