
Security News
GitHub Actions Checkout Now Blocks Risky pull_request_target Checkouts
GitHub Actions checkout now blocks risky pull_request_target checkouts by default to help prevent pwn request supply chain attacks.
@stll/fuzzy-search
Advanced tools
Approximate substring matching for Node.js and Bun via a Rust Myers engine exposed through NAPI-RS.
NAPI-RS approximate substring matching for Node.js and Bun. Finds near-matches within edit distance k with stable UTF-16 offsets, replace-safe match ranges, and optional diacritics normalization.
Built on Myers' bit-parallel algorithm (1999), implemented in Rust and exposed to JavaScript via NAPI-RS.
npm install @stll/fuzzy-search
# or
bun add @stll/fuzzy-search
The companion @stll/fuzzy-search-wasm package is
available for browser builds.
If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:
import { defineConfig } from "vite";
import stllFuzzySearchWasm from "@stll/fuzzy-search-wasm/vite";
export default defineConfig({
plugins: [stllFuzzySearchWasm()],
});
Prebuilts are available for:
| Platform | Architecture |
|---|---|
| macOS | x64, arm64 |
| Linux (glibc) | x64, arm64 |
| WASM | browser |
import { FuzzySearch } from "@stll/fuzzy-search";
const fs = new FuzzySearch(
[
{ pattern: "Gaislerová", distance: 1 },
{ pattern: "Novák", distance: 1 },
{ pattern: "Příbram", distance: 2 },
],
{
normalizeDiacritics: true,
wholeWords: true,
},
);
fs.findIter("Smlouva s Gais1erová v Pribram");
// [
// { pattern: 0, start: 10, end: 20,
// text: "Gais1erová", distance: 1 },
// { pattern: 2, start: 23, end: 30,
// text: "Pribram", distance: 0 },
// ]
Patterns can be strings (default distance 1) or objects with explicit distance and optional name:
const fs = new FuzzySearch([
"simple", // distance 1
{ pattern: "named", name: "entity" }, // distance 1
{ pattern: "precise", distance: 2 }, // distance 2
]);
Distance must be less than pattern length.
const fs = new FuzzySearch(patterns, {
// Strip diacritics before matching (NFD + remove
// combining marks). "Příbram" matches "Pribram"
// at distance 0.
normalizeDiacritics: true, // default: false
// Only match whole words. Uses Unicode
// is_alphanumeric() for boundary detection.
// CJK characters always pass (no inter-word
// spaces in CJK).
wholeWords: true, // default: true
// Case-insensitive matching (Unicode-aware).
caseInsensitive: true, // default: false
// Unicode word boundaries (reserved for future
// UAX#29 segmentation support).
unicodeBoundaries: true, // default: true
// Drop matches whose score is below threshold.
// Score = 1 - distance / pattern.length.
// Inclusive (score >= minScore keeps the match).
minScore: 0.7,
// Return only the top k matches by score, across
// all patterns. Tie-broken by start, then pattern.
kBest: 5,
});
Every match carries a normalized score in [0, 1],
computed as 1 - distance / pattern.length and
clamped at 0. Pair it with minScore and kBest for
top-N ranking without a follow-up sort:
const fs = new FuzzySearch(
[
{ pattern: "Novák", distance: 2 },
{ pattern: "Gaislerová", distance: 2 },
],
{ wholeWords: true, minScore: 0.7, kBest: 3 },
);
fs.findIter("Nowák a Gais1erova");
// [
// { pattern: 0, text: "Nowák", distance: 1, score: 0.8, ... },
// { pattern: 1, text: "Gais1erova", distance: 2, score: 0.8, ... },
// ]
replaceAll always replaces every distance-qualified
match and ignores minScore / kBest, so the
replacements-by-pattern contract stays
deterministic.
fs.replaceAll("Smlouva s Gais1erová", [
"[REDACTED]",
"[REDACTED]",
"[REDACTED]",
]);
// "Smlouva s [REDACTED]"
replacements[i] replaces pattern i.
import { distance } from "@stll/fuzzy-search";
distance("kitten", "sitting"); // 3
distance("abcd", "abdc", "damerau-levenshtein"); // 1
The repository includes a checked-in benchmark harness for synthetic and corpus-based searches. The inputs are public and the scripts are reproducible from the repo. Run them locally:
bun run bench:install
bun run bench:download
bun run bench:speed
bun run bench:correctness
The speed harness compares practical JS ecosystem
alternatives, but not every comparator implements the
same exact semantics. @stll/fuzzy-search is solving
approximate substring search with offsets and
replacement-friendly match ranges; tools like
fuse.js and fuzzball are included as reference
points, not as exact drop-in equivalents. The
headline comparisons in this repo are the
substring-mode rows against sliding-window
Levenshtein baselines.
Representative baseline from the checked-in public harness on this machine:
1.3.1226.4.1 (Darwin arm64)| Scenario | @stll/fuzzy-search | Sliding-window JS baseline | Relative |
|---|---|---|---|
Czech legal, 64 KB, 5 names | 2.41 ms | 80.78 ms | 33.5x |
Bible, 4.0 MB, 5 names | 239.91 ms | 3903.26 ms | 16.3x |
Czech news, 4.8 MB, 5 names | 262.39 ms | 4350.52 ms | 16.6x |
German news, 5.5 MB, 5 names | 405.72 ms | 6816.03 ms | 16.8x |
These rows are substring mode (wholeWords: false)
with edit distance 1-2, which is the core workload
this package is designed for.
Correctness is covered by example-based tests and property tests. The property suite verifies distance bounds, oracle agreement, whole-word boundaries, UTF-16 offset stability, normalization behavior, and mixed option combinations over randomized inputs.
| Method | Returns | Description |
|---|---|---|
new FuzzySearch(patterns, options?) | instance | Build matcher |
.findIter(haystack) | FuzzyMatch[] | Non-overlapping matches |
.isMatch(haystack) | boolean | Any pattern matches? |
.replaceAll(haystack, replacements) | string | Replace matched patterns |
.patternCount | number | Number of patterns |
type PatternEntry =
| string
| { pattern: string; distance?: number; name?: string };
type Options = {
normalizeDiacritics?: boolean; // default: false
wholeWords?: boolean; // default: true
caseInsensitive?: boolean; // default: false
unicodeBoundaries?: boolean; // default: true
minScore?: number; // drop matches below threshold
kBest?: number; // top-k by score, ties by start
};
type FuzzyMatch = {
pattern: number; // index into patterns array
start: number; // UTF-16 code unit offset
end: number; // exclusive
text: string; // matched substring
distance: number; // actual Levenshtein distance
score: number; // 1 - distance/pattern.length
name?: string; // pattern name (if provided)
};
Match offsets are UTF-16 code unit indices,
compatible with String.prototype.slice().
replaceAll throws if replacements.length
does not equal patternCount.Myers' bit-parallel algorithm scans the text in O(n) per pattern for patterns up to 64 characters. No DFA construction, no state explosion at higher distances.
Start position recovery via small-window Levenshtein: for each match end position from Myers, a window of [m-k, m+k] characters is evaluated to find the exact start and distance.
Diacritics normalization: NFD decomposition +
combining mark stripping (Unicode General
Category M via unicode-normalization crate).
Covers all scripts.
UTF-16 offset translation: character-level matching with incremental char→UTF-16 mapping for JS string compatibility.
@stll/aho-corasick's StreamMatcher for exact
prefiltering and fuzzy-search on flagged regions.SharedArrayBuffer. Browser
builds need Cross-Origin-Opener-Policy: same-origin
and Cross-Origin-Embedder-Policy: require-corp
headers.bun install
bun run build # native module (requires Rust)
bun test # 36 unit tests
bun run test:props # 36 property tests × 1000 runs
bun run bench:install # benchmark dependencies
bun run bench:download # download corpora
bun run bench:speed # speed comparison
bun run bench:correctness # oracle verification
bun run lint # oxlint
bun run format # oxfmt + rustfmt
FAQs
Approximate substring matching for Node.js and Bun via a Rust Myers engine exposed through NAPI-RS.
We found that @stll/fuzzy-search demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
GitHub Actions checkout now blocks risky pull_request_target checkouts by default to help prevent pwn request supply chain attacks.

Product
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.

Product
Socket MCP now lets AI assistants review org alerts, investigate threats using the Socket threat feed, and inspect package files in addition to dependency scoring.