Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
approx-string-match
Advanced tools
A library for approximate string matching.
This can be used to find occurrences of a pattern P (of length m) in a text T (of length n) allowing for up to a given number of errors (k), where errors may include insertions, substitutions or deletions of characters from the pattern.
For example the pattern "annd" occurs in the string "four score and seven" with one error.
The implementation uses a bit-parallel algorithm by G. Myers [1] which, to the best of my knowledge, is the state of the art algorithm for the online version of the problem (where the text and pattern cannot be preprocessed in advance). It runs in O((k/w) * n) expected-time where k <= m and w is the word size (32 in JavaScript). It also includes some additional optimizations suggested in [3]. See comments in the code for more details.
npm install --save approx-string-match
import search from "approx-string-match";
const text = "Four score and seven";
const pattern = "annd";
const matches = search(text, pattern, 2 /* max errors */);
console.log(matches);
// Outputs `[{ start: 11, end: 14, errors: 1 }]`
The library exports a single function search(text, pattern, maxErrors)
which
returns an array of the closest matches for pattern in text allowing up to
maxErrors errors.
interface Match {
start: number;
end: number;
errors: number;
}
search(text: string, pattern: string, maxErrors: number): Match[]
The algorithm uses bitwise operations on integers. Since JavaScript only supports bitwise operations on 32-bit integers, that is the word size, regardless of the platform.
If JS gains support for bitwise operations on larger integers in future, that support could be used to speed up this library.
The library currently works on code units rather than code points, where the
code unit is a UTF-16 value. What this means is that a change to a unicode
character which requires multiple characters to represent in a JavaScript
string, such as emoji, would actually count as two changes rather than one. This
is because such chars require two elements to represent in a string (eg.
"😊".length
is 2).
For an overview of the different approaches to approximate string matching and the history of the development of solutions, there is a good survey paper [2].
[1] G. Myers, “A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming,” vol. 46, no. 3, pp. 395–415, 1999.
[2] G. Navarro, “A guided tour to approximate string matching,” ACM Comput. Surv., vol. 33, no. 1, pp. 31–88, 2001.
[3] Šošić, M. (2014). "An SIMD dynamic programming c/c++ library" (Doctoral dissertation, Fakultet Elektrotehnike i računarstva, Sveučilište u Zagrebu).
[2.0.0] - 2021-11-21
This package has been converted to ES module format. To use it you will need a version of Node and other tooling which supports ES modules. The package does not include a CommonJS build, please file an issue if you need one.
This package is now built as modern JavaScript (ES 2017) and will no longer work in IE 11.
maxErrors
is smaller than the pattern length by at least 32. #13FAQs
A library for approximate string matching.
We found that approx-string-match demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.