
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
_ _ _ _
_ __ ___ (_) _ __ | |__ __ _ ___ | |__ (_) ___
| '_ ` _ \ | | | '_ \ | '_ \ / _` | / __| | '_ \ | | / __|
| | | | | | | | | | | | | | | | | (_| | \__ \ | | | | _ | | \__ \
|_| |_| |_| |_| |_| |_| |_| |_| \__,_| |___/ |_| |_| (_) _/ | |___/
|__/
Minhashing is an efficient similarity estimation technique that is often used to identify near-duplicate documents in large text collections. This package offers a JavaScript implementation of the minhash algorithm and an efficient Locality Sensitive Hashing Index for finding similar minhashes in Node.js or web applications.
To get started with Minhash.js, you can install the package with npm:
npm install minhash --save
If you prefer, you can instead load the package directly in a browser:
<script src='https://rawgit.com/duhaime/minhash/master/minhash.min.js' />
Minhashes are hash representations of the contents within a set. The following example minhashes and then estimates the Jaccard similarity between two sets:
import { Minhash } from 'minhash'; // If using Node.js
var s1 = ['minhash', 'is', 'a', 'probabilistic', 'data', 'structure', 'for',
'estimating', 'the', 'similarity', 'between', 'datasets'];
var s2 = ['minhash', 'is', 'a', 'probability', 'data', 'structure', 'for',
'estimating', 'the', 'similarity', 'between', 'documents'];
// create a hash for each set of words to compare
var m1 = new Minhash();
var m2 = new Minhash();
// update each hash
s1.map(function(w) { m1.update(w) });
s2.map(function(w) { m2.update(w) });
// estimate the jaccard similarity between two minhashes
m1.jaccard(m2);
While one can compare the Jaccard similarity between a minhash and all others in a collection, the complexity of doing so is O(n), as one needs to compare the query set to every other set.
To estimate the results of the same comparison in sub-linear time, one can instead build a Locality Sensitive Hash Index, which maps hash sequences from a minhash signature to the list of document identifiers that contain the given hash sequence. Using this indexing technique, one can effectively find sets similar to a query set:
import { Minhash, LshIndex } from 'minhash'; // If using Node.js
var s1 = ['minhash', 'is', 'a', 'probabilistic', 'data', 'structure', 'for',
'estimating', 'the', 'similarity', 'between', 'datasets'];
var s2 = ['minhash', 'is', 'a', 'probability', 'data', 'structure', 'for',
'estimating', 'the', 'similarity', 'between', 'documents'];
var s3 = ['cats', 'are', 'tall', 'and', 'have', 'been',
'known', 'to', 'sing', 'quite', 'loudly'];
// generate a hash for each list of words
var m1 = new Minhash();
var m2 = new Minhash();
var m3 = new Minhash();
// update each hash
s1.map(function(w) { m1.update(w) });
s2.map(function(w) { m2.update(w) });
s3.map(function(w) { m3.update(w) });
// add each document to a Locality Sensitive Hashing index
var index = new LshIndex();
index.insert('m1', m1);
index.insert('m2', m2);
index.insert('m3', m3);
// query for documents that appear similar to a query document
var matches = index.query(m1);
console.log('Jaccard similarity >= 0.5 to m1:', matches);
The sample application uses minhash.js to compute the similarity between several sample documents:

There is also a sample Node.js script that can be run with node examples/index.js.
To run the test suite — npm run test.
To compile and minify minhash.min.js — npm run build.
FAQs
Find similar documents quickly
The npm package minhash receives a total of 2,543 weekly downloads. As such, minhash popularity was classified as popular.
We found that minhash demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.