🚀 Socket Launch Week Day 5:Introducing Repository Access Permissions and Custom Roles.Learn more
Sign In

@stll/aho-corasick

Package Overview
Dependencies
Maintainers
1
Versions
18
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@stll/aho-corasick

Exact many-pattern string search for Node.js and Bun via Rust's aho-corasick engine exposed through NAPI-RS.

latest
Source
npmnpm
Version
1.0.4
Version published
Maintainers
1
Created
Source

stella

@stll/aho-corasick

NAPI-RS bindings to the Rust aho-corasick crate for Node.js and Bun.

Multi-pattern string searching in linear time. Built on BurntSushi's aho-corasick (the same engine that powers ripgrep), exposed to JavaScript via NAPI-RS.

Install

npm install @stll/aho-corasick
# or
bun add @stll/aho-corasick

The companion @stll/aho-corasick-wasm package is available for browser builds.

If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:

import { defineConfig } from "vite";
import stllAhoCorasickWasm from "@stll/aho-corasick-wasm/vite";

export default defineConfig({
  plugins: [stllAhoCorasickWasm()],
});

Prebuilts are available for:

PlatformArchitecture
macOSx64, arm64
Linux (glibc)x64, arm64
WASMbrowser

Usage

import { AhoCorasick } from "@stll/aho-corasick";

const ac = new AhoCorasick(["foo", "bar", "baz"]);

// Check for any match
ac.isMatch("hello foo world"); // true

// Find all non-overlapping matches
ac.findIter("foo bar baz");
// [
//   { pattern: 0, start: 0, end: 3, text: "foo" },
//   { pattern: 1, start: 4, end: 7, text: "bar" },
//   { pattern: 2, start: 8, end: 11, text: "baz" },
// ]

// Find overlapping matches
ac.findOverlappingIter("foobar");

// Replace matches
ac.replaceAll("foo bar", ["FOO", "BAR", "BAZ"]);
// "FOO BAR"

Options

const ac = new AhoCorasick(patterns, {
  // Match semantics (default: "leftmost-first")
  matchKind: "leftmost-longest",
  // Unicode simple case folding (default: false)
  caseInsensitive: true,
  // Only match whole words (default: false)
  // Unicode-aware; CJK always passes
  wholeWords: true,
});

Streaming

For processing large files or streams chunk by chunk:

import { StreamMatcher } from "@stll/aho-corasick";

const sm = new StreamMatcher(["needle", "haystack"]);

for await (const chunk of readableStream) {
  const matches = sm.write(chunk);
  for (const m of matches) {
    console.log(
      `Pattern ${m.pattern} ` + `at ${m.start}..${m.end}`,
    );
  }
}

sm.flush(); // finalize stream state
sm.reset(); // reuse for another stream

StreamMatcher automatically handles overlap between chunks so that matches spanning chunk boundaries are found.

Groups

To organize patterns into named groups (e.g., for entity recognition), use the pattern index as a lookup key into a parallel array:

const GROUPS = {
  LEGAL_FORM: ["s.r.o.", "GmbH", "LLC"],
  CURRENCY: ["EUR", "USD", "CZK"],
};

const patterns: string[] = [];
const tag: string[] = [];

for (const [group, terms] of Object.entries(GROUPS)) {
  for (const term of terms) {
    patterns.push(term);
    tag.push(group);
  }
}

const ac = new AhoCorasick(patterns, {
  wholeWords: true,
});

for (const m of ac.findIter(text)) {
  console.log(m.text, tag[m.pattern]);
  // "GmbH" "LEGAL_FORM"
  // "EUR"  "CURRENCY"
}

tag[m.pattern] is a single array index lookup (O(1), no hashing).

Benchmarks

The repository includes a checked-in benchmark harness for ASCII, Unicode, and WASM cases. The inputs are public and the scripts are reproducible from the repo. Run it locally:

bun run bench:install
bun run bench:download
bun run bench:all

The harness compares multiple JS/TS implementations on public corpora and verifies equal match counts across libraries. The speed suite is anchored in the many-pattern exact-search workloads where Aho-Corasick is meant to win, rather than single-call toy regex cases.

Representative baseline from the checked-in public harness on this machine:

  • runtime: Bun 1.3.12
  • platform: macOS 26.4.1 (Darwin arm64)
Scenario@stll/aho-corasickBest compared JS/TS resultRelative
bible.txt, 4.0 MB, 20 terms2.27 ms69.25 ms30.5x
world192.txt, 2.5 MB, 20 terms1.55 ms41.88 ms26.9x
E.coli, 4.6 MB, 16 codons0.83 ms11.00 ms13.3x
bible.txt, single-pattern baseline0.67 ms20.10 ms29.9x
Alternatives tested

API

AhoCorasick

MethodReturnsDescription
new AhoCorasick(patterns, options?)instanceBuild automaton
.findIter(haystack)Match[]Non-overlapping matches
.findOverlappingIter(haystack)Match[]All overlapping matches
.isMatch(haystack)booleanAny pattern matches?
.replaceAll(haystack, replacements)stringReplace matched patterns
.findIterBuf(haystack)ByteMatch[]Matches in a Buffer / Uint8Array (byte offsets)
.isMatchBuf(haystack)booleanAny pattern matches in a Buffer / Uint8Array?
.patternCountnumberNumber of patterns

StreamMatcher

MethodReturnsDescription
new StreamMatcher(patterns, options?)instanceBuild streaming matcher
.write(chunk)ByteMatch[]Feed chunk, get global byte-offset matches
.flush()ByteMatch[]Finalize stream
.reset()voidReset for reuse

Types

type MatchKind = "leftmost-first" | "leftmost-longest";

type Options = {
  matchKind?: MatchKind;
  caseInsensitive?: boolean;
  wholeWords?: boolean;
  dfa?: boolean;
};

// Returned by string methods (findIter, etc.)
type Match = {
  pattern: number; // index into patterns array
  start: number; // UTF-16 code unit offset
  end: number; // exclusive
  text: string; // matched substring
};

// Returned by Buffer/streaming methods
type ByteMatch = {
  pattern: number; // index into patterns array
  start: number; // byte offset
  end: number; // exclusive
};

Error handling

  • new AhoCorasick([...]) throws if the automaton cannot be built (e.g. patterns exceed internal size limits).
  • replaceAll(haystack, replacements) throws if replacements.length !== patternCount.
  • Empty patterns arrays are valid; all search methods return no matches.

Limitations

  • Case insensitivity uses Unicode simple case folding, not locale-specific collation. It handles one-to-one folds like Turkish İ -> i and ẞ -> ß, but it does not perform full multi-character expansions such as ß -> ss.
  • WASM requires SharedArrayBuffer. Browser builds need Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers. Edge runtimes without WASM support (some Cloudflare Workers configurations) are not supported.

Acknowledgements

This package is a thin binding layer. The hard work is done by:

  • aho-corasick by Andrew Gallant (BurntSushi) — the Rust implementation of the Aho-Corasick algorithm. MIT licensed.
  • NAPI-RS — the Rust framework for building Node.js native addons. MIT licensed.

Development

# Install dependencies
bun install

# Build native module (requires Rust toolchain)
bun run build

# Run tests (113 tests, including Unicode edge cases)
bun test

# Download benchmark corpora
bun run bench:download

# Install benchmark dependencies (alternatives)
bun run bench:install

# Run benchmarks
bun run bench:speed       # Canterbury corpus
bun run bench:unicode     # Leipzig corpora
bun run bench:correctness # cross-library comparison
bun run bench:all         # all three

# Lint & format
bun run lint
bun run format

License

MIT

Keywords

aho-corasick

FAQs

Package last updated on 11 Jun 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts