Fast ternary string set
A fast string set based on ternary search trees. Features:
- Drop-in replacement for most code that uses a standard JavaScript
Set
of string elements. - All search results and iteration methods list elements in ascending sorted (lexicographic) order.
- Includes several approximate matching methods:
- List all elements that can be made from a given list of letters.
- List all elements that match a pattern including "don't care" letters (as
.
in a regular expression). - List all elements within a specified Hamming distance of a pattern.
- All matching, including approximate matching, is based on full code points and not char codes.
- Balances search time and memory consumption: stored strings share tree nodes and do not require a reference to the original strings.
- The tree structure is encoded in a form that is friendly to common JS engine optimizations.
- Elements are stored by Unicode code point; any valid Unicode string can be added to a set.
- Sets can be serialized to a compact binary format (as an
ArrayBuffer
). - Written in fully documented TypeScript, targeting modern JavaScript engines.
- Backed by extensive test suites.
- No other dependencies.
Installation
To install the latest stable version:
npm install fast-ternary-string-set
Or, if using yarn
, yarn add fast-ternary-string-set
.
Alternatively, to use it without Node.js, copy the the main source file (src/index.ts
) into any TypeScript project, then import the copied file in your code.
Examples of use
To load the module:
const { TernaryStringSet } = require("fast-ternary-string-set");
Or, from TypeScript:
import { TernaryStringSet } from "fast-ternary-string-set";
Create a new string set and add some strings:
const set = new TernaryStringSet();
set.add("dog").add("cat").add("eagle");
set.has("cat");
set.delete("cat");
set.has("cat");
set.has(123.456);
Add an entire array of string elements:
const stringArray = [
"aardvark", "beaver", "cat",
"dog", "eagle", "zebra"
];
set.addAll(stringArray);
Iterate over all elements in sorted order:
for (const el of set) {
console.log(el);
}
set.forEach((el) => console.log(el));
Find all elements that can be made from the letters of "taco":
set.getArrangementsOf("taco");
Find all elements within Hamming distance 1 of "cat":
set.getWithinHammingDistanceOf("cat", 1);
Find all elements that match "b.t":
set.getPartialMatchesOf("b.t");
Serialize to or from a binary blob:
const buff = set.toBuffer();
const set = TernaryStringSet.fromBuffer(buff);
Differences from standard JS Set
TernaryStringSet
supports a superset of the standard Set
interface, but it is not a subclass of Set
.
JavaScript Set
objects guarantee that they iterate over elements in the order that they are added.
TernaryStringSet
s always return results in sorted order.
TernaryStringSet
s can contain the empty string, but cannot contain non-strings. This includes null
and undefined
.
Tips
Adding strings to a ternary tree in sorted order produces a worst-case tree structure. This can be avoided by adding
sorted strings using the addAll
method, which produces an optimal tree structure given a sorted input array.
Alternatively, the balance
method can be called to rebuild the tree with optimal structure, but this is expensive.
After deleting a large number of strings, future search performance may be improved by calling balance
.
An ideal use case for this data structure is one in which the tree will be generated ahead of time and then used to
test elements or perform approximate matching. During the generation phase, strings can be added in arbitrary order,
then the tree can be balance
d and serialized to an ArrayBuffer
. The set can then be used by loading the tree data
directly from the loaded binary data, bypassing the need to load and add individual strings.
Developing
Building from source requires Node.js. Clone the repository, then install development dependencies:
npm install
The TypeScript source is found under src
. Compiled output is written to lib
. To build the project:
npm run build
The included tsconfig.json
targets ES2015 (ES6). To target old JavaScript engines or browsers you will need to modify this configuration and/or use a tool like Babel.
The project includes an extensive suite of tests under src/tests
. To run all tests:
npm test