Security News
The Risks of Misguided Research in Supply Chain Security
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
ml-distance
Advanced tools
The ml-distance npm package provides a collection of functions to calculate various types of distances between vectors. It is useful for tasks in machine learning, data analysis, and other fields where measuring similarity or dissimilarity between data points is important.
Euclidean Distance
Calculates the Euclidean distance between two vectors. This is the straight-line distance in Euclidean space.
const distance = require('ml-distance');
const euclidean = distance.euclidean([1, 2], [4, 6]);
console.log(euclidean); // Output: 5
Manhattan Distance
Calculates the Manhattan distance between two vectors. This is the sum of the absolute differences of their Cartesian coordinates.
const distance = require('ml-distance');
const manhattan = distance.manhattan([1, 2], [4, 6]);
console.log(manhattan); // Output: 7
Cosine Similarity
Calculates the cosine similarity between two vectors. This measures the cosine of the angle between them, which is useful for determining how similar two vectors are.
const distance = require('ml-distance');
const cosine = distance.cosine([1, 2], [4, 6]);
console.log(cosine); // Output: 0.9838699100999074
Jaccard Index
Calculates the Jaccard index between two sets. This is the size of the intersection divided by the size of the union of the sets.
const distance = require('ml-distance');
const jaccard = distance.jaccard([1, 2, 3], [2, 3, 4]);
console.log(jaccard); // Output: 0.5
The compute-distance package provides a variety of distance metrics similar to ml-distance, including Euclidean, Manhattan, and Chebyshev distances. It is a good alternative for users looking for a different implementation or additional distance metrics.
The ml-matrix package offers a comprehensive set of matrix operations, including distance calculations. It is more feature-rich compared to ml-distance, providing additional functionalities for matrix manipulation and linear algebra.
The distance package focuses on string distance metrics like Levenshtein, Jaro-Winkler, and Hamming distances. While it overlaps with ml-distance in some areas, it is more specialized in string comparison.
Distance functions to compare vectors.
$ npm i ml-distance
euclidean(p, q)
Returns the euclidean distance between vectors p and q
$d(p,q)=\sqrt{\sum\limits_{i=1}^{n}(p_i-q_i)^2}$
manhattan(p, q)
Returns the city block distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\left|p_i-q_i\right|}$
minkowski(p, q, d)
Returns the Minkowski distance between vectors p and q for order d
chebyshev(p, q)
Returns the Chebyshev distance between vectors p and q
$d(p,q)=\max\limits_i(|p_i-q_i|)$
sorensen(p, q)
Returns the Sørensen distance between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{\left|p_i-q_i\right|}}{\sum\limits_{i=1}^{n}{p_i+q_i}}$
gower(p, q)
Returns the Gower distance between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{\left|p_i-q_i\right|}}{n}$
soergel(p, q)
Returns the Soergel distance between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{\left|p_i-q_i\right|}}{max(p_i,q_i)}$
kulczynski(p, q)
Returns the Kulczynski distance between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{\left|p_i-q_i\right|}}{min(p_i,q_i)}$
canberra(p, q)
Returns the Canberra distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}\frac{\left|{p_i-q_i}\right|}{p_i+q_i}$
lorentzian(p, q)
Returns the Lorentzian distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}\ln(\left|{p_i-q_i}\right|+1)$
intersection(p, q)
Returns the Intersection distance between vectors p and q
$d(p,q)=1-\sum\limits_{i=1}^{n}min(p_i,q_i)$
waveHedges(p, q)
Returns the Wave Hedges distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}\left(1-\frac{min(p_i,q_i)}{max(p_i,q_i)}\right)$
czekanowski(p, q)
Returns the Czekanowski distance between vectors p and q
$d(p,q)=1-\frac{2\sum\limits_{i=1}^{n}{min(p_i,q_i)}}{\sum\limits_{i=1}^{n}{p_i+q_i}}$
motyka(p, q)
Returns the Motyka distance between vectors p and q
$d(p,q)=1-\frac{\sum\limits_{i=1}^{n}{min(p_i,q_i)}}{\sum\limits_{i=1}^{n}{p_i+q_i}}$
Note: distance between 2 identical vectors is 0.5 !
ruzicka(p, q)
Returns the Ruzicka similarity between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{max(p_i,q_i)}}{\sum\limits_{i=1}^{n}{min(p_i,q_i)}}$
tanimoto(p, q, [bitVector])
Returns the Tanimoto distance between vectors p and q, and accepts the bitVector use, see the test case for an example
innerProduct(p, q)
Returns the Inner Product similarity between vectors p and q
$s(p,q)=\sum\limits_{i=1}^{n}{p_i\cdot{q_i}}$
harmonicMean(p, q)
Returns the Harmonic mean similarity between vectors p and q
$d(p,q)=2\sum\limits_{i=1}^{n}\frac{p_i\cdot{q_i}}{p_i+q_i}$
cosine(p, q)
Returns the Cosine similarity between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{p_i\cdot{q_i}}}{\sum\limits_{i=1}^{n}{p_i^2}\sum\limits_{i=1}^{n}{q_i^2}}$
kumarHassebrook(p, q)
Returns the Kumar-Hassebrook similarity between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{p_i\cdot{q_i}}}{\sum\limits_{i=1}^{n}{p_i^2}+\sum\limits_{i=1}^{n}{q_i^2}-\sum\limits_{i=1}^{n}{p_i\cdot{q_i}}}$
jaccard(p, q)
Returns the Jaccard distance between vectors p and q
$d(p,q)=1-\frac{\sum\limits_{i=1}^{n}{p_i\cdot{q_i}}}{\sum\limits_{i=1}^{n}{p_i^2}+\sum\limits_{i=1}^{n}{q_i^2}-\sum\limits_{i=1}^{n}{p_i\cdot{q_i}}}$
dice(p,q)
Returns the Dice distance between vectors p and q
$d(p,q)=1-\frac{\sum\limits_{i=1}^{n}{(p_i-q_i)^2}}{\sum\limits_{i=1}^{n}{p_i^2}+\sum\limits_{i=1}^{n}{q_i^2}}$
fidelity(p, q)
Returns the Fidelity similarity between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\sqrt{p_i\cdot{q_i}}}$
bhattacharyya(p, q)
Returns the Bhattacharyya distance between vectors p and q
$d(p,q)=-\ln\left(\sum\limits_{i=1}^{n}{\sqrt{p_i\cdot{q_i}}}\right)$
hellinger(p, q)
Returns the Hellinger distance between vectors p and q
$d(p,q)=2\cdot\sqrt{1-\sum\limits_{i=1}^{n}{\sqrt{p_i\cdot{q_i}}}}$
matusita(p, q)
Returns the Matusita distance between vectors p and q
$d(p,q)=\sqrt{2-2\cdot\sum\limits_{i=1}^{n}{\sqrt{p_i\cdot{q_i}}}}$
squaredChord(p, q)
Returns the Squared-chord distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{(\sqrt{p_i}-\sqrt{q_i})^2}$
squaredEuclidean(p, q)
Returns the squared euclidean distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{(p_i-q_i)^2}$
pearson(p, q)
Returns the Pearson distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\frac{(p_i-q_i)^2}{q_i}}$
neyman(p, q)
Returns the Neyman distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\frac{(p_i-q_i)^2}{p_i}}$
squared(p, q)
Returns the Squared distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\frac{(p_i-q_i)^2}{p_i+q_i}}$
probabilisticSymmetric(p, q)
Returns the Probabilistic Symmetric distance between vectors p and q
$d(p,q)=2\cdot\sum\limits_{i=1}^{n}{\frac{(p_i-q_i)^2}{p_i+q_i}}$
divergence(p, q)
Returns the Divergence distance between vectors p and q
$d(p,q)=2\cdot\sum\limits_{i=1}^{n}{\frac{(p_i-q_i)^2}{(p_i+q_i)^2}}$
clark(p, q)
Returns the Clark distance between vectors p and q
$d(p,q)=\sqrt{\sum\limits_{i=1}^{n}{\left(\frac{\left|p_i-q_i\right|}{(p_i+q_i)}\right)^2}}$
additiveSymmetric(p, q)
Returns the Additive Symmetric distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\frac{(p_i-q_i)^2\cdot(p_i+q_i)}{p_i\cdot{q_i}}}$
kullbackLeibler(p, q)
Returns the Kullback-Leibler distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{p_i\cdot\ln\frac{p_i}{q_i}}$
jeffreys(p, q)
Returns the Jeffreys distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\left((p_i-q_i)\ln\frac{p_i}{q_i}\right)}$
kdivergence(p, q)
Returns the K divergence distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\left(p_i\cdot\ln\frac{2p_i}{p_i+q_i}\right)}$
topsoe(p, q)
Returns the Topsøe distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\left(p_i\cdot\ln\frac{2p_i}{p_i+q_i}+q_i\cdot\ln\frac{2q_i}{p_i+q_i}\right)}$
jensenShannon(p, q)
Returns the Jensen-Shannon distance between vectors p and q
$d(p,q)=\frac{1}{2}\left[\sum\limits_{i=1}^{n}{p_i\cdot\ln\frac{2p_i}{p_i+q_i}}+\sum\limits_{i=1}^{n}{q_i\cdot\ln\frac{2q_i}{p_i+q_i}}\right]$
jensenDifference(p, q)
Returns the Jensen difference distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\left[\frac{p_i\ln{p_i}+q_i\ln{q_i}}{2}-\left(\frac{p_i+q_i}{2}\right)\ln\left(\frac{p_i+q_i}{2}\right)\right]}$
taneja(p, q)
Returns the Taneja distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\left[\frac{p_i+q_i}{2}\ln\left(\frac{p_i+q_i}{2\sqrt{p_i\cdot{q_i}}}\right)\right]}$
kumarJohnson(p, q)
Returns the Kumar-Johnson distance between vectors p and q
$d(p,q)=\sum\limits_{i=1}^{n}{\frac{\left(p_i^2-q_i^2\right)^2}{2(p_i\cdot{q_i})^{3/2}}}$
avg(p, q)
Returns the average of city block and Chebyshev distances between vectors p and q
$d(p,q)=\frac{\sum\limits_{i=1}^{n}{\left|p_i-q_i\right|}+\max\limits_i(|p_i-q_i|)}{2}$
intersection(p, q)
Returns the Intersection similarity between vectors p and q
czekanowski(p, q)
Returns the Czekanowski similarity between vectors p and q
motyka(p, q)
Returns the Motyka similarity between vectors p and q
kulczynski(p, q)
Returns the Kulczynski similarity between vectors p and q
squaredChord(p, q)
Returns the Squared-chord similarity between vectors p and q
jaccard(p, q)
Returns the Jaccard similarity between vectors p and q
dice(p, q)
Returns the Dice similarity between vectors p and q
tanimoto(p, q, [bitVector])
Returns the Tanimoto similarity between vectors p and q, and accepts the bitVector use, see the test case for an example
tree(a,b, from, to, [options])
Refer to ml-tree-similarity
A new metric should normally be in its own file in the src/dist directory. There should be a corresponding test file in test/dist.
The metric should be then added in the exports of src/index.js with a relatively small but understandable name (use camelCase).
It should also be added to this README with either a link to the formula or an inline description.
FAQs
Distance and similarity functions to compare vectors
We found that ml-distance demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 8 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
Research
Security News
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.