Socket
Book a DemoInstallSign in
Socket

simhash-vocabulary

Package Overview
Dependencies
Maintainers
2
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

simhash-vocabulary

Vocabulary-based SimHash implementation for similarity detection

latest
Source
npmnpm
Version
1.0.2
Version published
Maintainers
2
Created
Source

simhash-vocabulary

Vocabulary-based SimHash implementation for similarity detection.

Installation

npm install simhash-vocabulary

Usage

const { SimHash } = require('simhash-vocabulary')

// Define your vocabulary
const vocabulary = ['cat', 'dog', 'bird', 'fish', 'tree', 'house']

const simhash = new SimHash(vocabulary)

// Hash token arrays to 256-bit (32-byte) buffers
const hash1 = simhash.hash(['cat', 'dog', 'bird'])
const hash2 = simhash.hash(['cat', 'dog', 'fish'])
const hash3 = simhash.hash(['tree', 'house'])

// Compare similarity via Hamming distance
console.log(SimHash.hammingDistance(hash1, hash2)) // small distance (similar)
console.log(SimHash.hammingDistance(hash1, hash3)) // larger distance (different)

API

new SimHash(vocabulary)

Create a SimHash instance with a fixed vocabulary. Each token gets a deterministic 256-bit vector derived from its SHA-256 hash.

simhash.hash(tokens)

Compute a 32-byte SimHash buffer from an array of tokens. Tokens not in the vocabulary are ignored with a warning.

SimHash.hammingDistance(buf1, buf2)

Calculate the Hamming distance between two buffers (number of differing bits). Lower values indicate higher similarity.

How it works

SimHash converts a set of tokens into a fixed-size fingerprint where similar inputs produce similar outputs. The algorithm accumulates weighted bit vectors for each token, then thresholds the result to produce the final hash.

License

Apache-2.0

FAQs

Package last updated on 05 Jan 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts