
Security News
Frontier AI Is Now Critical Infrastructure
The Fable shutdown shows how quickly model access can become a business continuity risk for AI-dependent engineering teams.
@stll/aho-corasick
Advanced tools
Exact many-pattern string search for Node.js and Bun via Rust's aho-corasick engine exposed through NAPI-RS.
NAPI-RS bindings to the Rust aho-corasick crate for Node.js and Bun.
Multi-pattern string searching in linear time. Built on BurntSushi's aho-corasick (the same engine that powers ripgrep), exposed to JavaScript via NAPI-RS.
npm install @stll/aho-corasick
# or
bun add @stll/aho-corasick
The companion @stll/aho-corasick-wasm package is
available for browser builds.
If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:
import { defineConfig } from "vite";
import stllAhoCorasickWasm from "@stll/aho-corasick-wasm/vite";
export default defineConfig({
plugins: [stllAhoCorasickWasm()],
});
Prebuilts are available for:
| Platform | Architecture |
|---|---|
| macOS | x64, arm64 |
| Linux (glibc) | x64, arm64 |
| WASM | browser |
import { AhoCorasick } from "@stll/aho-corasick";
const ac = new AhoCorasick(["foo", "bar", "baz"]);
// Check for any match
ac.isMatch("hello foo world"); // true
// Find all non-overlapping matches
ac.findIter("foo bar baz");
// [
// { pattern: 0, start: 0, end: 3, text: "foo" },
// { pattern: 1, start: 4, end: 7, text: "bar" },
// { pattern: 2, start: 8, end: 11, text: "baz" },
// ]
// Find overlapping matches
ac.findOverlappingIter("foobar");
// Replace matches
ac.replaceAll("foo bar", ["FOO", "BAR", "BAZ"]);
// "FOO BAR"
const ac = new AhoCorasick(patterns, {
// Match semantics (default: "leftmost-first")
matchKind: "leftmost-longest",
// Unicode simple case folding (default: false)
caseInsensitive: true,
// Only match whole words (default: false)
// Unicode-aware; CJK always passes
wholeWords: true,
});
For processing large files or streams chunk by chunk:
import { StreamMatcher } from "@stll/aho-corasick";
const sm = new StreamMatcher(["needle", "haystack"]);
for await (const chunk of readableStream) {
const matches = sm.write(chunk);
for (const m of matches) {
console.log(
`Pattern ${m.pattern} ` + `at ${m.start}..${m.end}`,
);
}
}
sm.flush(); // finalize stream state
sm.reset(); // reuse for another stream
StreamMatcher automatically handles overlap
between chunks so that matches spanning chunk
boundaries are found.
To organize patterns into named groups (e.g., for
entity recognition), use the pattern index as a
lookup key into a parallel array:
const GROUPS = {
LEGAL_FORM: ["s.r.o.", "GmbH", "LLC"],
CURRENCY: ["EUR", "USD", "CZK"],
};
const patterns: string[] = [];
const tag: string[] = [];
for (const [group, terms] of Object.entries(GROUPS)) {
for (const term of terms) {
patterns.push(term);
tag.push(group);
}
}
const ac = new AhoCorasick(patterns, {
wholeWords: true,
});
for (const m of ac.findIter(text)) {
console.log(m.text, tag[m.pattern]);
// "GmbH" "LEGAL_FORM"
// "EUR" "CURRENCY"
}
tag[m.pattern] is a single array index lookup
(O(1), no hashing).
The repository includes a checked-in benchmark harness for ASCII, Unicode, and WASM cases. The inputs are public and the scripts are reproducible from the repo. Run it locally:
bun run bench:install
bun run bench:download
bun run bench:all
The harness compares multiple JS/TS implementations on public corpora and verifies equal match counts across libraries. The speed suite is anchored in the many-pattern exact-search workloads where Aho-Corasick is meant to win, rather than single-call toy regex cases.
Representative baseline from the checked-in public harness on this machine:
1.3.1226.4.1 (Darwin arm64)| Scenario | @stll/aho-corasick | Best compared JS/TS result | Relative |
|---|---|---|---|
bible.txt, 4.0 MB, 20 terms | 2.27 ms | 69.25 ms | 30.5x |
world192.txt, 2.5 MB, 20 terms | 1.55 ms | 41.88 ms | 26.9x |
E.coli, 4.6 MB, 16 codons | 0.83 ms | 11.00 ms | 13.3x |
bible.txt, single-pattern baseline | 0.67 ms | 20.10 ms | 29.9x |
AhoCorasick| Method | Returns | Description |
|---|---|---|
new AhoCorasick(patterns, options?) | instance | Build automaton |
.findIter(haystack) | Match[] | Non-overlapping matches |
.findOverlappingIter(haystack) | Match[] | All overlapping matches |
.isMatch(haystack) | boolean | Any pattern matches? |
.replaceAll(haystack, replacements) | string | Replace matched patterns |
.findIterBuf(haystack) | ByteMatch[] | Matches in a Buffer / Uint8Array (byte offsets) |
.isMatchBuf(haystack) | boolean | Any pattern matches in a Buffer / Uint8Array? |
.patternCount | number | Number of patterns |
StreamMatcher| Method | Returns | Description |
|---|---|---|
new StreamMatcher(patterns, options?) | instance | Build streaming matcher |
.write(chunk) | ByteMatch[] | Feed chunk, get global byte-offset matches |
.flush() | ByteMatch[] | Finalize stream |
.reset() | void | Reset for reuse |
type MatchKind = "leftmost-first" | "leftmost-longest";
type Options = {
matchKind?: MatchKind;
caseInsensitive?: boolean;
wholeWords?: boolean;
dfa?: boolean;
};
// Returned by string methods (findIter, etc.)
type Match = {
pattern: number; // index into patterns array
start: number; // UTF-16 code unit offset
end: number; // exclusive
text: string; // matched substring
};
// Returned by Buffer/streaming methods
type ByteMatch = {
pattern: number; // index into patterns array
start: number; // byte offset
end: number; // exclusive
};
new AhoCorasick([...]) throws if the
automaton cannot be built (e.g. patterns exceed
internal size limits).replaceAll(haystack, replacements) throws if
replacements.length !== patternCount.İ -> i
and ẞ -> ß, but it does not perform full
multi-character expansions such as ß -> ss.SharedArrayBuffer. Browser
builds need Cross-Origin-Opener-Policy: same-origin
and Cross-Origin-Embedder-Policy: require-corp
headers. Edge runtimes without WASM support
(some Cloudflare Workers configurations) are
not supported.This package is a thin binding layer. The hard work is done by:
# Install dependencies
bun install
# Build native module (requires Rust toolchain)
bun run build
# Run tests (113 tests, including Unicode edge cases)
bun test
# Download benchmark corpora
bun run bench:download
# Install benchmark dependencies (alternatives)
bun run bench:install
# Run benchmarks
bun run bench:speed # Canterbury corpus
bun run bench:unicode # Leipzig corpora
bun run bench:correctness # cross-library comparison
bun run bench:all # all three
# Lint & format
bun run lint
bun run format
FAQs
Exact many-pattern string search for Node.js and Bun via Rust's aho-corasick engine exposed through NAPI-RS.
We found that @stll/aho-corasick demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
The Fable shutdown shows how quickly model access can become a business continuity risk for AI-dependent engineering teams.

Security News
AI agents are pulling packages into environments no scanner is watching, creating exposure before security teams can see it.

Security News
GitHub Actions checkout now blocks risky pull_request_target checkouts by default to help prevent pwn request supply chain attacks.