🚀 Socket Launch Week Day 5:Introducing Repository Access Permissions and Custom Roles.Learn more
Sign In

@stll/fuzzy-search

Package Overview
Dependencies
Maintainers
1
Versions
15
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@stll/fuzzy-search

Approximate substring matching for Node.js and Bun via a Rust Myers engine exposed through NAPI-RS.

latest
Source
npmnpm
Version
1.1.2
Version published
Maintainers
1
Created
Source

stella

NAPI-RS approximate substring matching for Node.js and Bun. Finds near-matches within edit distance k with stable UTF-16 offsets, replace-safe match ranges, and optional diacritics normalization.

Built on Myers' bit-parallel algorithm (1999), implemented in Rust and exposed to JavaScript via NAPI-RS.

Install

npm install @stll/fuzzy-search
# or
bun add @stll/fuzzy-search

The companion @stll/fuzzy-search-wasm package is available for browser builds.

If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:

import { defineConfig } from "vite";
import stllFuzzySearchWasm from "@stll/fuzzy-search-wasm/vite";

export default defineConfig({
  plugins: [stllFuzzySearchWasm()],
});

Prebuilts are available for:

PlatformArchitecture
macOSx64, arm64
Linux (glibc)x64, arm64
WASMbrowser

Usage

import { FuzzySearch } from "@stll/fuzzy-search";

const fs = new FuzzySearch(
  [
    { pattern: "Gaislerová", distance: 1 },
    { pattern: "Novák", distance: 1 },
    { pattern: "Příbram", distance: 2 },
  ],
  {
    normalizeDiacritics: true,
    wholeWords: true,
  },
);

fs.findIter("Smlouva s Gais1erová v Pribram");
// [
//   { pattern: 0, start: 10, end: 20,
//     text: "Gais1erová", distance: 1 },
//   { pattern: 2, start: 23, end: 30,
//     text: "Pribram", distance: 0 },
// ]

Patterns

Patterns can be strings (default distance 1) or objects with explicit distance and optional name:

const fs = new FuzzySearch([
  "simple", // distance 1
  { pattern: "named", name: "entity" }, // distance 1
  { pattern: "precise", distance: 2 }, // distance 2
]);

Distance must be less than pattern length.

Options

const fs = new FuzzySearch(patterns, {
  // Strip diacritics before matching (NFD + remove
  // combining marks). "Příbram" matches "Pribram"
  // at distance 0.
  normalizeDiacritics: true, // default: false

  // Only match whole words. Uses Unicode
  // is_alphanumeric() for boundary detection.
  // CJK characters always pass (no inter-word
  // spaces in CJK).
  wholeWords: true, // default: true

  // Case-insensitive matching (Unicode-aware).
  caseInsensitive: true, // default: false

  // Unicode word boundaries (reserved for future
  // UAX#29 segmentation support).
  unicodeBoundaries: true, // default: true

  // Drop matches whose score is below threshold.
  // Score = 1 - distance / pattern.length.
  // Inclusive (score >= minScore keeps the match).
  minScore: 0.7,

  // Return only the top k matches by score, across
  // all patterns. Tie-broken by start, then pattern.
  kBest: 5,
});

Scored output

Every match carries a normalized score in [0, 1], computed as 1 - distance / pattern.length and clamped at 0. Pair it with minScore and kBest for top-N ranking without a follow-up sort:

const fs = new FuzzySearch(
  [
    { pattern: "Novák", distance: 2 },
    { pattern: "Gaislerová", distance: 2 },
  ],
  { wholeWords: true, minScore: 0.7, kBest: 3 },
);

fs.findIter("Nowák a Gais1erova");
// [
//   { pattern: 0, text: "Nowák", distance: 1, score: 0.8, ... },
//   { pattern: 1, text: "Gais1erova", distance: 2, score: 0.8, ... },
// ]

replaceAll always replaces every distance-qualified match and ignores minScore / kBest, so the replacements-by-pattern contract stays deterministic.

Replace

fs.replaceAll("Smlouva s Gais1erová", [
  "[REDACTED]",
  "[REDACTED]",
  "[REDACTED]",
]);
// "Smlouva s [REDACTED]"

replacements[i] replaces pattern i.

Distance helper

import { distance } from "@stll/fuzzy-search";

distance("kitten", "sitting"); // 3
distance("abcd", "abdc", "damerau-levenshtein"); // 1

Benchmarks

The repository includes a checked-in benchmark harness for synthetic and corpus-based searches. The inputs are public and the scripts are reproducible from the repo. Run them locally:

bun run bench:install
bun run bench:download
bun run bench:speed
bun run bench:correctness

The speed harness compares practical JS ecosystem alternatives, but not every comparator implements the same exact semantics. @stll/fuzzy-search is solving approximate substring search with offsets and replacement-friendly match ranges; tools like fuse.js and fuzzball are included as reference points, not as exact drop-in equivalents. The headline comparisons in this repo are the substring-mode rows against sliding-window Levenshtein baselines.

Representative baseline from the checked-in public harness on this machine:

  • runtime: Bun 1.3.12
  • platform: macOS 26.4.1 (Darwin arm64)
Scenario@stll/fuzzy-searchSliding-window JS baselineRelative
Czech legal, 64 KB, 5 names2.41 ms80.78 ms33.5x
Bible, 4.0 MB, 5 names239.91 ms3903.26 ms16.3x
Czech news, 4.8 MB, 5 names262.39 ms4350.52 ms16.6x
German news, 5.5 MB, 5 names405.72 ms6816.03 ms16.8x

These rows are substring mode (wholeWords: false) with edit distance 1-2, which is the core workload this package is designed for.

Alternatives tested
  • fastest-levenshtein + sliding window — fastest JS Levenshtein distance
  • fuse.js — fuzzy search (scoring, not substring matching)
  • fuzzball — Python rapidfuzz port
  • naive JS — O(nm) Levenshtein per window position

Correctness

Correctness is covered by example-based tests and property tests. The property suite verifies distance bounds, oracle agreement, whole-word boundaries, UTF-16 offset stability, normalization behavior, and mixed option combinations over randomized inputs.

API

MethodReturnsDescription
new FuzzySearch(patterns, options?)instanceBuild matcher
.findIter(haystack)FuzzyMatch[]Non-overlapping matches
.isMatch(haystack)booleanAny pattern matches?
.replaceAll(haystack, replacements)stringReplace matched patterns
.patternCountnumberNumber of patterns

Types

type PatternEntry =
  | string
  | { pattern: string; distance?: number; name?: string };

type Options = {
  normalizeDiacritics?: boolean; // default: false
  wholeWords?: boolean; // default: true
  caseInsensitive?: boolean; // default: false
  unicodeBoundaries?: boolean; // default: true
  minScore?: number; // drop matches below threshold
  kBest?: number; // top-k by score, ties by start
};

type FuzzyMatch = {
  pattern: number; // index into patterns array
  start: number; // UTF-16 code unit offset
  end: number; // exclusive
  text: string; // matched substring
  distance: number; // actual Levenshtein distance
  score: number; // 1 - distance/pattern.length
  name?: string; // pattern name (if provided)
};

Match offsets are UTF-16 code unit indices, compatible with String.prototype.slice().

Error handling

  • Constructor throws if a pattern is empty, longer than 64 characters, or has distance >= pattern length.
  • replaceAll throws if replacements.length does not equal patternCount.

How it works

  • Myers' bit-parallel algorithm scans the text in O(n) per pattern for patterns up to 64 characters. No DFA construction, no state explosion at higher distances.

  • Start position recovery via small-window Levenshtein: for each match end position from Myers, a window of [m-k, m+k] characters is evaluated to find the exact start and distance.

  • Diacritics normalization: NFD decomposition + combining mark stripping (Unicode General Category M via unicode-normalization crate). Covers all scripts.

  • UTF-16 offset translation: character-level matching with incremental char→UTF-16 mapping for JS string compatibility.

Limitations

  • Pattern length capped at 64 characters. Myers uses a single u64 bit-vector per pattern. Longer patterns would need multi-word vectors (not yet implemented).
  • No streaming API. The full haystack must be in memory. For chunked processing, use @stll/aho-corasick's StreamMatcher for exact prefiltering and fuzzy-search on flagged regions.
  • WASM requires SharedArrayBuffer. Browser builds need Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers.

Development

bun install
bun run build           # native module (requires Rust)
bun test                # 36 unit tests
bun run test:props      # 36 property tests × 1000 runs

bun run bench:install   # benchmark dependencies
bun run bench:download  # download corpora
bun run bench:speed     # speed comparison
bun run bench:correctness  # oracle verification

bun run lint            # oxlint
bun run format          # oxfmt + rustfmt

License

MIT

Keywords

approximate-matching

FAQs

Package last updated on 11 Jun 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts