
Security News
ECMAScript 2025 Finalized with Iterator Helpers, Set Methods, RegExp.escape, and More
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
v2.0
CmpStr is a lightweight and powerful npm package for calculating string similarity, finding the closest matches in arrays, performing phonetic searches, and more. It supports a variety of built-in algorithms (e.g., Levenshtein, Dice-SΓΈrensen, Damerau-Levenshtein, Soundex) and allows users to add custom algorithms and normalization filters.
Key Features
Install the package via npm:
npm install cmpstr
Importing the Package:
const { CmpStr } = require( 'cmpstr' );
Example 1: Basic String Similarity
const cmp = new CmpStr( 'levenshtein', 'hello' );
console.log( cmp.test( 'Hallo', { flags: 'i' } ) );
// Output: 0.8
Example 2: Phonetic Search
const cmp = new CmpStr( 'soundex', 'Robert' );
console.log( cmp.test( 'Rubin', { options: { raw: true } } ) );
// Output: { a: 'R163', b: 'R150' }
Creating a new instance of CmpStr
or CmpStrAsync
allows passing the algorithm to be used and the base string as optional arguments. Alternatively or later in the process, the setAlgo
and setStr
methods can be used for this purpose.
isReady()
Checks whether string and algorithm are set correctly. Returns true
, if the class is ready to perform similarity checks, false otherwise.
setStr( str )
Sets the base string for comparison.
Parameters:
<String> str
β string to set as the base
getStr()
Gets the base string for comparison.
setFlags( [ flags = '' ] )
Set default normalization flags. They will be overwritten by passing flags
through the configuration object. See description of available flags / normalization options below in the documentation.
getFlags()
Gets the default normalization flags.
clearCache()
Clears the normalization cache.
listAlgo( [ loadedOnly = false ] )
List all registered similarity algorithms.
Parameters:
<Boolean> loadedOnly
β it true, only loaded algorithm names are returned
isAlgo( algo )
Checks if an algorithm is registered. Returns true
if so, false
otherwise.
Parameters:
<String> algo
β name of the algorithm
setAlgo( algo )
Sets the current algorithm to use for similarity calculations.
Allowed options for build-in althorithms are cosine
, damerau
, dice
, hamming
, jaccard
, jaro
, lcs
, levenshtein
, needlemanWunsch
, qGram
, smithWaterman
and soundex
.
Parameters:
<String> algo
β name of the algorithm
getAlgo()
Gets the current algorithm to use for similarity calculations.
addAlgo( algo, callback [, useIt = true ] )
Adding a new similarity algorithm by using the addAlgo()
method passing the name and a callback function, that must accept at least two strings and return a number. If useIt
is true
, the new algorithm will automatically be set as the current one.
Parameters:
<String> algo
β name of the algorithm
<Function> callback
β callback function implementing the algorithm
<Boolean> useIt
β whether to set this algorithm as the current one
Example:
const cmp = new CmpStr();
cmp.addAlgo( 'customAlgo', ( a, b ) => {
return a === b ? 1 : 0;
} );
console.log( cmp.compare( 'customAlgo', 'hello', 'hello' ) );
// Output: 1
rmvAlgo( algo )
Removing a registered similarity algorithm.
Parameters:
<String> algo
β name of the algorithm
listFilter()
List all added filters.
addFilter( name, callback [, priority = 10 ] )
Adds a custom normalization filter. Needs to be passed a unique name and callback function accepting a string and returns a normalized one. Prioritizing filters by setting higher priority (default is 10
).
Parameters:
<String> name
β filter name
<Function> callback
β callback function implementing the filter
<Int> priority
β priority of the filter
Example:
const cmp = new CmpStr();
cmp.addFilter( 'prefix', ( str ) => `prefix_${str}` );
rmvFilter( name )
Removes a custom normalization filter.
Parameters:
<String> name
β filter name
pauseFilter( name )
Pauses a custom normalization filter.
Parameters:
<String> name
β filter name
resumeFilter( name )
Resumes a custom normalization filter.
Parameters:
<String> name
β filter name
clearFilter( name )
Clears normalization filters (removing all of them).
compare( algo, a, b [, config = {} ] )
Compares two strings using the specified algorithm. The method returns either the similarity score as a floating point number between 0 and 1 or raw output, if the algorithm supports it and the user passes raw=true
through the config options.
Parameters:
<String> algo
β name of the algorithm
<String> a
β first string
<String> b
β second string
<Object> config
β configuration object
Example:
const cmp = new CmpStr();
console.log( cmp.compare( 'levenshtein', 'hello', 'hallo' ) );
// Output: 0.8
test( str [, config = {} ] )
Tests the similarity between the base string and a given target string. Returns the same as compare
.
Parameters:
<String> str
β target string
<Object> config
β configuration object
Example:
const cmp = new CmpStr( 'levenshtein', 'hello' );
console.log( cmp.test( 'hallo' ) );
// Output: 0.8
batchTest( arr [, config = {} ] )
Tests the similarity of multiple strings against the base string. Returns an array of objects with the target string and either the similarity score as a floating point number between 0 and 1 or raw output, if the algorithm supports it and the user passes raw=true
through the config options.
Parameters:
<String[]> arr
β array of strings
<Object> config
β configuration object
Example:
const cmp = new CmpStr( 'levenshtein', 'hello' );
console.log( cmp.batchTest( [ 'hallo', 'hola', 'hey' ] ) );
// Output: [ { target: 'hallo', match: 0.8 }, { target: 'hola', match: 0.4 }, { target: 'hey', match: 0.4 } ]
match( arr [, config = {} ] )
Finds strings in an array that exceed a similarity threshold and sorts them by highest similarity. Returns an array of objects contain target string and similarity score as a floating point number between 0 and 1.
Parameters:
<String[]> arr
β array of strings
<Object> config
β configuration object
Example:
const cmp = new CmpStr( 'levenshtein', 'hello' );
console.log( cmp.batchTest( [ 'hallo', 'hola', 'hey' ], {
threshold: 0.5
} ) );
// Output: [ { target: 'hallo', match: 0.8 } ]
closest( arr [, config = {} ] )
Finds the closest matching string from an array and returns them.
Parameters:
<String[]> arr
β array of strings
<Object> config
β configuration object
Example:
const cmp = new CmpStr( 'levenshtein', 'hello' );
console.log( cmp.batchTest( [ 'hallo', 'hola', 'hey' ] ) );
// Output: 'hallo'
similarityMatrix( algo, arr [, config = {} ] )
Generates a similarity matrix for an array of strings. Returns an 2D array that represents the similarity matrix by floating point numbers between 0 and 1.
Parameters:
<String> algo
β name of the algorithm
<String[]> arr
β array of strings
<Object> config
β configuration object
Example:
const cmp = new CmpStr();
console.log( cmp.similarityMatrix( 'levenshtein', [
'hello', 'hallo', 'hola'
] ) );
// Output: [ [ 1, 0.8, 0.4 ], [ 0.8, 1, 0.4 ], [ 0.4, 0.4, 1 ] ]
The CmpStr
package allows strings to be normalized before the similarity comparison. Options listed below are available for this and can either be set globally via setFlags
or passed using the config object, which will overwrite the global flags. Flags are passed as a chained string in any order. For improved performance, normalized strings are stored in the cache, which can be freed using the clearCache
method. Modifying custom filters automatically deletes the cache.
s
β remove special chars
w
β collapse whitespaces
r
β remove repeated chars
k
β keep only letters
n
β ignore numbers
t
β trim whitespaces
i
β case insensitivity
d
β decompose unicode
u
β normalize unicode
normalize( input [, flags = '' ] )
The method for normalizing strings can also be called on its own, without comparing the similarity of two strings. This also applies all filters and reads or writes to the cache. This can be helpful if certain strings should be saved beforehand or different normalization options want to be tested.
Parameters:
<String|String[]> input
β single string or array of strings to normalize
<String> flags
normalization flags
Example:
const cmp = new CmpStr();
console.log( cmp.normalize( ' he123LLo ', 'nti' ) );
// Output: hello
console.log( cmp.normalize( [ 'Hello World!', 'CmpStr 123' ], 'nwti' ) );
// Output: [ 'hello world!', 'cmpstr' ]
An additional object with optional parameters can be passed to all comparison methods (e.g. test
, match
, closest
etc.) and their asynchronous pendants. This object includes the ability to pass flags
for normalization to all methods, as well as the threshold
parameter for match
and matchAsync
.
It also contains options
as an object of key-value pairs that are passed to the comparison algorithm. Which additional arguments an algorithm accepts depends on the function exported from the module itself. Further down in this documentation, the various parameters for each algorithm are listed.
Global config options:
<String> flags
β normalization flags
<Number> threshold
β similarity threshold between 0 and 1
<Object> options
β options passed to the algorithm
Example:
const cmp = new CmpStr( 'smithWaterman', 'alignment' );
console.log( cmp.match( [
' align ment', 'ali gnm ent ', ' alIGNMent'
], {
flags: 'it',
threshold: 0.8,
options: {
mismatch: -4,
gap: -2
}
} ) );
// Output: [ { target: ' alIGNMent', match: 1 }, { target: ' align ment', match: 0.8... }
]
The CmpStrAsync
class provides an asynchronous wrapper for all comparison methods as well as the string normalization function. It is ideal for large datasets or non-blocking workflows.
The asynchronous class supports the methods normalizeAsync
, compareAsync
, testAsync
, batchTestAsync
, matchAsync
, closestAsync
and similarityMatrixAsync
. Each of these methods returns a Promise
.
For options, arguments and returned values, see the documentation above.
Example:
const { CmpStrAsync } = require( 'cmpstr' );
const cmp = new CmpStrAsync( 'dice', 'best' );
cmp.batchTestAsync( [
'better', 'bestest', 'the best', 'good', ...
] ).then( console.log );
The following algorithms for similarity analysis are natively supported by the CmpStr package. Lazy-loading keeps memory consumption and loading time low, as only the algorithm intended to be used will be loaded as a module.
levenshtein
The Levenshtein distance between two strings is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
Options:
<Boolean> raw
β if true the raw distance is returned
damerau
The Damerau-Levenshtein distance differs from the classical Levenshtein distance by including transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions). Useful for correcting typos.
Options:
<Boolean> raw
β if true the raw distance is returned
jaro
Jaro-Winkler is a string similarity metric that gives more weight to matching characters at the start of the strings.
Options:
<Boolean> raw
β if true the raw distance is returned
cosine
Cosine similarity is a measure how similar two vectors are. It's often used in text analysis to compare texts based on the words they contain.
Options:
<String> delimiter
β term delimiter
dice
The Dice-SΓΈrensen index equals twice the number of elements common to both sets divided by the sum of the number of elements in each set. Equivalently the index is the size of the intersection as a fraction of the average size of the two sets.
jaccard
The Jaccard Index measures the similarity between two sets by dividing the size of their intersection by the size of their union.
hamming
The Hamming distance between two equal-length strings of symbols is the number of positions at which the corresponding symbols are different.
lcs
LCS measures the length of the longest subsequence common to both strings.
needlemanWunsch
The Needleman-Wunsch algorithm performs global alignment, aligning two strings entirely, including gaps. It is commonly used in bioinformatics.
Options:
<Number> match
β score for a match
<Number> mismatch
β penalty for a mismatch
<Number> gap
β penalty for a gap
smithWaterman
The Smith-Waterman algorithm performs local alignment, finding the best matching subsequence between two strings. It is commonly used in bioinformatics.
Options:
<Number> match
β score for a match
<Number> mismatch
β penalty for a mismatch
<Number> gap
β penalty for a gap
qGram
Q-gram similarity is a string-matching algorithm that compares two strings by breaking them into substrings of length Q. It's used to determine how similar the two strings are.
Options:
<Int> q
length of substrings
soundex
The Soundex algorithm generates a phonetic representation of a string based on how it sounds. It supports predefined setups for English and German and allows users to provide custom options.
Options:
<String> lang
β language code for predefined setups (e.g., en
, de
)
<Boolean> raw
β if true, returns the raw sound index codes
<Object> mapping
β custom phonetic mapping (overrides predefined)
<String> exclude
β characters to exclude from the input (overrides predefined)
<Number> maxLength
β maximum length of the phonetic code
FAQs
CmpStr is a lightweight, fast and well performing package for calculating string similarity
The npm package cmpstr receives a total of 425 weekly downloads. As such, cmpstr popularity was classified as not popular.
We found that cmpstr demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
Security News
A new Node.js homepage button linking to paid support for EOL versions has sparked a heated discussion among contributors and the wider community.
Research
North Korean threat actors linked to the Contagious Interview campaign return with 35 new malicious npm packages using a stealthy multi-stage malware loader.