
Product
Introducing Repository Access Permissions and Custom Roles
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.
text-similarity-node
Advanced tools
High-performance and memory efficient native C++ text similarity algorithms for Node.js
High-performance and memory efficient native C++ text similarity algorithms for Node.js with full Unicode support. text-similarity-node provides a suite of production-ready algorithms that demonstrably outperform pure JavaScript alternatives, especially in memory usage and specific use cases. This library is the best choice for comparing large documents where other JavaScript libraries slow down.
Before installing, ensure you have the necessary build tools installed on your system:
node-gyp).xcode-select --install).npm install text-similarity-node
After installing globally, you can use the text-similarity command directly from your terminal:
npm install -g text-similarity-node
Calculate a similarity score (0–1) between two strings:
# Default (Levenshtein)
text-similarity similarity "hello" "hallo"
# 0.8
# Choose a different algorithm
text-similarity similarity "hello" "hallo" -a jaro-winkler
# 0.88
# Case-insensitive comparison
text-similarity similarity "Hello" "hello" -i
# 1
# JSON output
text-similarity similarity "hello" "hallo" -a cosine -f json
# { "success": true, "value": 0.5 }
Calculate the distance between two strings:
text-similarity distance "kitten" "sitting"
# 3
text-similarity distance "hello" "hallo" -a hamming
# 1
Process multiple string pairs from a JSON file:
# pairs.json: [["hello","hallo"],["world","word"],["test","best"]]
text-similarity batch pairs.json -a levenshtein
# "hello" <-> "hallo" => 0.8
# "world" <-> "word" => 0.8
# "test" <-> "best" => 0.75
text-similarity batch pairs.json -a jaccard -f json
text-similarity algorithms
| Option | Description |
|---|---|
-a, --algorithm <name> | Algorithm to use (default: levenshtein) |
-p, --preprocessing <mode> | Preprocessing: none, character, word, ngram |
-i, --ignore-case | Case-insensitive comparison |
-n, --ngram-size <size> | N-gram size (default: 2) |
--threshold <value> | Early termination threshold |
--alpha <value> | Alpha weight for Tversky index |
--beta <value> | Beta weight for Tversky index |
--prefix-weight <value> | Prefix weight for Jaro-Winkler (0.0–0.25) |
-f, --format <type> | Output format: plain (default), json |
-v, --version | Show version |
-h, --help | Show help |
const textSimilarity = require("text-similarity-node");
// Levenshtein Similarity (edit distance)
textSimilarity.similarity.levenshtein("hello", "hallo"); // 0.8
// Jaccard Similarity (set intersection)
textSimilarity.similarity.jaccard("hello world", "hello universe", true); // 0.33
// Cosine Similarity with different options
textSimilarity.similarity.cosine("hello", "hallo"); // 0.5 (character n-grams)
textSimilarity.similarity.cosine("hello world", "hello universe", true); // 0.49 (word-based)
// Additional algorithms
textSimilarity.similarity.jaro("hello", "hallo"); // 0.86
textSimilarity.similarity.jaroWinkler("hello", "hallo"); // 0.88
textSimilarity.similarity.dice("hello", "hallo"); // 0.5
// Distance measurements
textSimilarity.distance.levenshtein("hello", "hallo"); // 1
textSimilarity.distance.hamming("hello", "hallo"); // 1
// Unicode Support
textSimilarity.similarity.levenshtein("café", "cafe"); // 0.75
textSimilarity.similarity.jaccard("Hello 👋 World 🌍", "Hello 👋 World 🌎"); // 0.86 (different globe emoji)
// Case-insensitive comparison
textSimilarity.similarity.levenshtein("Hello", "hello", false); // 1.0
The text-similarity-node library was created based on algorithm implementations from the TextDistance Python library, achieving a 95% success rate for result compatibility between this library and the reference Python version. The 95% compatibility rate is due to different tokenization methods implemented for cosine similarity calculations.
The Modern API provides comprehensive configuration options and consistent return formats:
const textSimilarity = require("text-similarity-node");
// Basic similarity calculation
const result = textSimilarity.calculateSimilarity("hello", "hallo");
console.log(result); // { success: true, value: 0.8 }
// Specify algorithm type
const result2 = textSimilarity.calculateSimilarity(
"hello world",
"hello universe",
textSimilarity.AlgorithmType.JACCARD,
);
console.log(result2); // { success: true, value: 0.39 }
// Full configuration example
const result3 = textSimilarity.calculateSimilarity(
"hello world",
"world hello",
textSimilarity.AlgorithmType.COSINE,
{
preprocessing: textSimilarity.PreprocessingMode.WORD,
caseSensitivity: textSimilarity.CaseSensitivity.INSENSITIVE,
ngramSize: 2,
},
);
console.log(result3); // { success: true, value: 1.0 }
// Advanced algorithm-specific configuration
const jaroWinklerResult = textSimilarity.calculateSimilarity(
"martha",
"marhta",
textSimilarity.AlgorithmType.JARO_WINKLER,
{
prefixWeight: 0.1,
prefixLength: 4,
caseSensitivity: textSimilarity.CaseSensitivity.INSENSITIVE,
},
);
// Tversky similarity with custom weights
const tverskyResult = textSimilarity.calculateSimilarity(
"information retrieval",
"information extraction",
textSimilarity.AlgorithmType.TVERSKY,
{
preprocessing: textSimilarity.PreprocessingMode.WORD,
alpha: 0.8, // Weight for first string
beta: 0.2, // Weight for second string
caseSensitivity: textSimilarity.CaseSensitivity.INSENSITIVE,
},
);
// Distance calculations
const distance = textSimilarity.calculateDistance(
"kitten",
"sitting",
textSimilarity.AlgorithmType.LEVENSHTEIN,
);
console.log(distance); // { success: true, value: 3 }
// Batch processing
const pairs = [
["hello", "hallo"],
["world", "word"],
["test", "best"],
];
const batchResults = textSimilarity.calculateSimilarityBatch(
pairs,
textSimilarity.AlgorithmType.LEVENSHTEIN,
{ caseSensitivity: textSimilarity.CaseSensitivity.INSENSITIVE },
);
console.log(batchResults);
// [{ success: true, value: 0.8 }, { success: true, value: 0.8 }, { success: true, value: 0.75 }]
// Asynchronous API
async function example() {
const similarity = await textSimilarity.calculateSimilarityAsync(
"hello world",
"hello universe",
textSimilarity.AlgorithmType.COSINE,
{
preprocessing: textSimilarity.PreprocessingMode.WORD,
caseSensitivity: textSimilarity.CaseSensitivity.INSENSITIVE,
},
);
console.log(similarity); // 0.5
const batchAsync = await textSimilarity.calculateSimilarityBatchAsync(
pairs,
textSimilarity.AlgorithmType.JACCARD,
);
console.log(batchAsync); // [0.67, 0.8, 0.6]
}
// Global configuration
textSimilarity.setGlobalConfiguration({
preprocessing: textSimilarity.PreprocessingMode.WORD,
caseSensitivity: textSimilarity.CaseSensitivity.INSENSITIVE,
ngramSize: 3,
});
// All subsequent calls will use global config unless overridden
const withGlobalConfig = textSimilarity.calculateSimilarity(
"Hello World",
"hello world",
);
console.log(withGlobalConfig); // { success: true, value: 1.0 }
// Override global config for specific call
const overrideGlobal = textSimilarity.calculateSimilarity(
"Hello World",
"hello world",
textSimilarity.AlgorithmType.LEVENSHTEIN,
{ caseSensitivity: textSimilarity.CaseSensitivity.SENSITIVE },
);
console.log(overrideGlobal); // { success: true, value: 0.82 }
// Increase max string length for large document comparison (default: 100KB)
textSimilarity.setGlobalConfiguration({
maxStringLength: 5 * 1024 * 1024, // Allow up to 5MB strings
});
const docResult = textSimilarity.calculateSimilarity(
largeDocument1,
largeDocument2,
textSimilarity.AlgorithmType.EUCLIDEAN,
{ preprocessing: textSimilarity.PreprocessingMode.WORD },
);
// Available algorithm types
textSimilarity.AlgorithmType = {
LEVENSHTEIN: 0, // Edit distance
DAMERAU_LEVENSHTEIN: 1, // Edit distance with transpositions
HAMMING: 2, // Equal-length string distance
JARO: 3, // Fuzzy string matching
JARO_WINKLER: 4, // Jaro with prefix weighting
JACCARD: 5, // Set similarity coefficient
SORENSEN_DICE: 6, // Dice coefficient
OVERLAP: 7, // Overlap coefficient
TVERSKY: 8, // Asymmetric similarity with weights
COSINE: 9, // Vector space cosine similarity
EUCLIDEAN: 10, // Euclidean distance
MANHATTAN: 11, // Manhattan distance
CHEBYSHEV: 12, // Chebyshev distance
};
// Preprocessing modes
textSimilarity.PreprocessingMode = {
NONE: 0, // No preprocessing
CHARACTER: 1, // Character-level comparison
WORD: 2, // Word-level tokenization
NGRAM: 3, // N-gram based tokenization
};
// Case sensitivity options
textSimilarity.CaseSensitivity = {
SENSITIVE: 0, // Case-sensitive comparison
INSENSITIVE: 1, // Case-insensitive with Unicode support
};
// Full configuration object structure
const fullConfig = {
algorithm: textSimilarity.AlgorithmType.COSINE, // Algorithm to use
preprocessing: textSimilarity.PreprocessingMode.WORD, // Text processing mode
caseSensitivity: textSimilarity.CaseSensitivity.INSENSITIVE, // Case handling
ngramSize: 2, // N-gram size (default: 2)
threshold: 0.5, // Early termination threshold
alpha: 0.5, // Tversky alpha parameter
beta: 0.5, // Tversky beta parameter
prefixWeight: 0.1, // Jaro-Winkler prefix weight (0.0-0.25)
prefixLength: 4, // Jaro-Winkler max prefix length
maxStringLength: 100000, // Max input string length in bytes (default: 100000 ≈ 100KB)
};
// Get supported algorithms
const algorithms = textSimilarity.getSupportedAlgorithms();
console.log(algorithms);
// [{ type: 0, name: 'LEVENSHTEIN' }, { type: 5, name: 'JACCARD' }, ...]
// Memory management
const memoryUsage = textSimilarity.getMemoryUsage();
console.log(`Memory usage: ${memoryUsage} bytes`);
textSimilarity.clearCaches(); // Clear internal caches
// Get current global configuration
const currentConfig = textSimilarity.getGlobalConfiguration();
console.log(currentConfig);
// Edit-based algorithms
textSimilarity.similarity.levenshtein(s1, s2, (caseSensitive = true));
textSimilarity.similarity.damerauLevenshtein(s1, s2, (caseSensitive = true));
textSimilarity.similarity.hamming(s1, s2, (caseSensitive = true));
// Phonetic algorithms
textSimilarity.similarity.jaro(s1, s2, (caseSensitive = true));
textSimilarity.similarity.jaroWinkler(
s1,
s2,
(caseSensitive = true),
(prefixWeight = 0.1),
);
// Token-based algorithms
textSimilarity.similarity.jaccard(
s1,
s2,
(useWords = false),
(caseSensitive = true),
(ngramSize = 2),
);
textSimilarity.similarity.dice(
s1,
s2,
(useWords = false),
(caseSensitive = true),
(ngramSize = 2),
);
textSimilarity.similarity.cosine(
s1,
s2,
(useWords = false),
(caseSensitive = true),
(ngramSize = 2),
);
textSimilarity.similarity.tversky(
s1,
s2,
alpha,
beta,
(useWords = false),
(caseSensitive = true),
(ngramSize = 2),
);
textSimilarity.distance.levenshtein(s1, s2, (caseSensitive = true));
textSimilarity.distance.damerauLevenshtein(s1, s2, (caseSensitive = true));
textSimilarity.distance.hamming(s1, s2, (caseSensitive = true));
textSimilarity.distance.euclidean(
s1,
s2,
(useWords = false),
(caseSensitive = true),
(ngramSize = 2),
);
textSimilarity.distance.manhattan(
s1,
s2,
(useWords = false),
(caseSensitive = true),
(ngramSize = 2),
);
textSimilarity.distance.chebyshev(
s1,
s2,
(useWords = false),
(caseSensitive = true),
(ngramSize = 2),
);
All algorithms support async execution with worker threads:
// All similarity algorithms available in async form
await textSimilarity.async.levenshtein(s1, s2, caseSensitive);
await textSimilarity.async.jaccard(s1, s2, useWords, caseSensitive, ngramSize);
await textSimilarity.async.cosine(s1, s2, useWords, caseSensitive, ngramSize);
await textSimilarity.async.jaro(s1, s2, caseSensitive);
await textSimilarity.async.jaroWinkler(s1, s2, caseSensitive, prefixWeight);
// ... and more
| Algorithm Category | text-similarity-node | string-comparison | similarity |
|---|---|---|---|
| Edit-Based Algorithms | |||
| Levenshtein Distance | ✅ | ✅ | ❌ |
| Levenshtein Similarity | ✅ | ✅ | ✅ |
| Damerau-Levenshtein | ✅ | ❌ | ❌ |
| Hamming Distance | ✅ | ❌ | ❌ |
| Jaro Similarity | ✅ | ✅ | ❌ |
| Jaro-Winkler | ✅ | ✅ | ❌ |
| Token-Based Algorithms | |||
| Jaccard Similarity | ✅ | ✅ | ❌ |
| Sorensen-Dice | ✅ | ❌ | ❌ |
| Tversky Index | ✅ | ❌ | ❌ |
| Overlap Coefficient | ✅ | ❌ | ❌ |
| Cosine Similarity | ✅ | ✅ | ❌ |
| Vector-Based Algorithms | |||
| Euclidean Distance | ✅ | ❌ | ❌ |
| Manhattan Distance | ✅ | ❌ | ❌ |
| Chebyshev Distance | ✅ | ❌ | ❌ |
| Sequence-Based Algorithms | |||
| LCS (Longest Common Subsequence) | ❌ | ✅ | ❌ |
| Ratcliff-Obershelp | ❌ | ❌ | ❌ |
| Configuration & Features | |||
| Case-insensitive comparison | ✅ | ✅ | ✅ |
| Configurable n-gram sizes | ✅ | ❌ | ❌ |
| Word vs character tokenization | ✅ | ❌ | ❌ |
| Unicode normalization | ✅ | Partial | ❌ |
| Emoji support | ✅ | ✅ | ✅ |
| Performance & API | |||
| Native implementation (C++) | ✅ | ❌ | ❌ |
| Asynchronous API | ✅ | ❌ | ❌ |
| Worker thread support | ✅ | ❌ | ❌ |
| TypeScript definitions | ✅ | ✅ | ✅ |
| Memory optimization | ✅ | ❌ | ❌ |
Based on extensive benchmarks, text-similarity-node stands out by delivering exceptional performance and scalability where it matters most.
Built with a native C++ core, text-similarity-node delivers a minimal memory footprint—ideal for memory-sensitive applications and large-scale data processing.
string-comparison (nearly 90× more).text-similarity-node is optimized for long strings, outperforming JavaScript-based libraries:
similarity library.The library leads in performance for modern similarity use cases:
string-comparison — ideal for tag or keyword analysis.Comprehensive Unicode support with proper handling of:
// International text examples
textSimilarity.similarity.levenshtein("Москва", "москва", false); // 1.0
textSimilarity.similarity.jaccard("你好世界", "你好世间"); // 0.5
// Emoji support
textSimilarity.similarity.cosine("Hello 👋🌍", "Hello 👋🌎"); // 0.86
# Install dependencies
npm install
# Build native addon
npm run build
# Run tests
npm test
xcode-select --install)sudo apt-get install build-essential)git checkout -b feature/amazing-featurenpm testDon't forget to exclude prebuilds directory from your pull request!
MIT License - see LICENSE file for details.
This library was created using a reference implementation TextDistance Python library, which provided a solid foundation for the algorithms and features included in this library.
FAQs
High-performance and memory efficient native C++ text similarity algorithms for Node.js
The npm package text-similarity-node receives a total of 71 weekly downloads. As such, text-similarity-node popularity was classified as not popular.
We found that text-similarity-node demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.

Product
Socket MCP now lets AI assistants review org alerts, investigate threats using the Socket threat feed, and inspect package files in addition to dependency scoring.

Product
Socket Firewall blocks malicious VS Code and Open VSX extensions before install, protecting developers from compromised editor marketplaces.