What is string-similarity?
The string-similarity npm package provides functions to find the similarity between strings and to find the best match among a set of strings compared to a target string. It uses various algorithms to calculate a similarity score and can be used in applications such as fuzzy matching, search optimizations, and data deduplication.
What are string-similarity's main functionalities?
Comparing two strings for similarity
This feature allows you to compare two strings and get a similarity score between 0 and 1, where 1 means the strings are identical.
const stringSimilarity = require('string-similarity');
const similarity = stringSimilarity.compareTwoStrings('string1', 'string2');
Finding the best match in an array of strings
This feature allows you to compare a target string against an array of strings and find the one that is most similar to the target. It returns an object with the best match and ratings for all strings.
const stringSimilarity = require('string-similarity');
const matches = stringSimilarity.findBestMatch('string', ['string1', 'string2', 'string3']);
Other packages similar to string-similarity
levenshtein
The levenshtein package provides a way to calculate the Levenshtein distance between two strings, which is a measure of the difference between two sequences. It is more focused on edit distance rather than similarity score.
fuzzyset.js
Fuzzyset.js is a fuzzy string set for JavaScript. It uses Levenshtein distance to compute the difference between strings and is useful for making fuzzy string matching more efficient by using a set data structure.
natural
Natural is a general natural language facility for Node.js. It includes a variety of string comparison algorithms, including Jaro-Winkler and Levenshtein distance, and provides more comprehensive natural language processing features beyond string comparison.
string-similarity
Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
Table of Contents
Usage
For Node.js
Install using:
npm install string-similarity --save
In your code:
var stringSimilarity = require('string-similarity');
var similarity = stringSimilarity.compareTwoStrings('healed', 'sealed');
var matches = stringSimilarity.findBestMatch('healed', ['edward', 'sealed', 'theatre']);
For browser apps
Include <script src="//unpkg.com/string-similarity/umd/string-similarity.min.js"></script>
to get the latest version.
Or <script src="//unpkg.com/string-similarity@4.0.1/umd/string-similarity.min.js"></script>
to get a specific version (4.0.1) in this case.
This exposes a global variable called stringSimilarity
which you can start using.
<script>
stringSimilarity.compareTwoStrings('what!', 'who?');
</script>
(The package is exposed as UMD, so you can consume it as such)
API
The package contains two methods:
compareTwoStrings(string1, string2)
Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-sensitive.
Arguments
- string1 (string): The first string
- string2 (string): The second string
Order does not make a difference.
Returns
(number): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.
Examples
stringSimilarity.compareTwoStrings('healed', 'sealed');
stringSimilarity.compareTwoStrings('Olive-green table for sale, in extremely good condition.',
'For sale: table in very good condition, olive green in colour.');
stringSimilarity.compareTwoStrings('Olive-green table for sale, in extremely good condition.',
'For sale: green Subaru Impreza, 210,000 miles');
stringSimilarity.compareTwoStrings('Olive-green table for sale, in extremely good condition.',
'Wanted: mountain bike with at least 21 gears.');
findBestMatch(mainString, targetStrings)
Compares mainString
against each string in targetStrings
.
Arguments
- mainString (string): The string to match each target string against.
- targetStrings (Array): Each string in this array will be matched against the main string.
Returns
(Object): An object with a ratings
property, which gives a similarity rating for each target string, a bestMatch
property, which specifies which target string was most similar to the main string, and a bestMatchIndex
property, which specifies the index of the bestMatch in the targetStrings array.
Examples
stringSimilarity.findBestMatch('Olive-green table for sale, in extremely good condition.', [
'For sale: green Subaru Impreza, 210,000 miles',
'For sale: table in very good condition, olive green in colour.',
'Wanted: mountain bike with at least 21 gears.'
]);
{ ratings:
[ { target: 'For sale: green Subaru Impreza, 210,000 miles',
rating: 0.2558139534883721 },
{ target: 'For sale: table in very good condition, olive green in colour.',
rating: 0.6060606060606061 },
{ target: 'Wanted: mountain bike with at least 21 gears.',
rating: 0.1411764705882353 } ],
bestMatch:
{ target: 'For sale: table in very good condition, olive green in colour.',
rating: 0.6060606060606061 },
bestMatchIndex: 1
}
Release Notes
2.0.0
- Removed production dependencies
- Updated to ES6 (this breaks backward-compatibility for pre-ES6 apps)
3.0.0
- Performance improvement for
compareTwoStrings(..)
: now O(n) instead of O(n^2) - The algorithm has been tweaked slightly to disregard spaces and word boundaries. This will change the rating values slightly but not enough to make a significant difference
- Adding a
bestMatchIndex
to the results for findBestMatch(..)
to point to the best match in the supplied targetStrings
array
3.0.1
- Refactoring: removed unused functions; used
substring
instead of substr
- Updated dependencies
4.0.1
- Distributing as an UMD build to be used in browsers.