Socket
Socket
Sign inDemoInstall

fast-ternary-string-set

Package Overview
Dependencies
0
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    fast-ternary-string-set

![CI status badge](https://github.com/CGJennings/fast-ternary-string-set/actions/workflows/ci.yml/badge.svg)


Version published
Maintainers
1
Install size
58.1 kB
Created

Readme

Source

Fast ternary string set

CI status badge

A fast, space-efficient, serializable string set based on ternary search trees, with both exact and approximate membership tests.

API Docs

Features

Installation

To install the latest stable version with npm:

npm install fast-ternary-string-set

Or, if using yarn:

yarn add fast-ternary-string-set

To use it without Node.js, you can simply copy the main source file (src/index.ts) into any TypeScript project, rename to something sensible, and then import it into your code as usual.

Examples

Complete API docs are available. Here are some examples of common tasks to get you started:

Loading the module:

// From node with CommonJS-style modules:
const { TernaryStringSet } = require("fast-ternary-string-set");
// From TypeScript:
import { TernaryStringSet } from "fast-ternary-string-set";

Create a new set and add some strings:

const set = new TernaryStringSet();
set.add("dog").add("cat").add("eagle");
// or alternatively
set.addAll(["aardvark", "beaver", "dog", "fish", "hamster"]);
set.has("cat");
// => true
set.delete("cat");
// => true since "cat" was in the set
set.has("cat");
// => false
set.has(123.456);
// => false (any non-string returns false)

Create a new string set from any Iterable<string>:

// otherSet could be any Iterable<string>, such as a string array
// or even another TernaryStringSet
let otherSet = new Set(["fish", "hippo"]);
let set = new TernaryStringSet(otherSet);
set.has("hippo");
// => true

Iterate over all elements in sorted order:

for (const el of set) {
    console.log(el);
}
// or equivalently:
set.forEach((el) => console.log(el));

Get all elements that start with "sha":

set.getCompletionsOf("sha");
// => ["shade", "shadow", "shake", "shape", "shapes"] (for example)

Get all elements that can be made from the letters of "taco":

set.getArrangementsOf("taco");
// => ["act", "cat", "coat", "taco"] (for example)

Get all elements within Hamming distance 1 of "cat":

set.getWithinHammingDistanceOf("cat", 1);
// => ["bat", "can", "cap", "cat", "cot", "sat"] (for example)

Get all elements that match "b.t":

set.getPartialMatchesOf("b.t");
// => ["bat", "bet", "bit", "bot", "but"] (for example)

Compare sets:

let s1 = new TernaryStringSet(["a", "b", "c"]);
let s2 = new TernaryStringSet(s1);
s1.equals(s2);
// => true
s1.isSubsetOf(s2);
// => true
s2.add("d");
s1.equals(s2);
// => false
s1.isSubsetOf(s2);
// => true
s1.isSupersetOf(s2);
// => false

Serialize to or from a buffer:

// write a set to an ArrayBuffer
const buff = set.toBuffer();

// create a new set from a previously saved ArrayBuffer
const set = TernaryStringSet.fromBuffer(buff);

A simple spelling checker:

// make-dict.js
// create dictionary set from a source list
const words = fs.readFileSync("./wordlist.txt", "utf8").split("\n");
const dict = new TernaryStringSet(words).add("");
dict.balance();
dict.compact();
fs.writeFileSync("./dict.bin", dict.toBuffer());

// check-spelling.js
// load the dictionary created by make-dict.js
const dict = TernaryStringSet.fromBuffer(fs.readFileSync("./dict.bin"));
const toCheck = "In a time long past lived a cat named Captain Peanut."
toCheck.toLowerCase().split(/\W+/).forEach((word) => {
    if (!dict.has(word)) {
        console.log(`misspelled: ${word}`);
    }
});

Usage notes

Differences from standard JS Set

TernaryStringSet supports a superset of the standard Set interface, but it is not a subclass of Set.

JavaScript Set objects guarantee that they iterate over elements in the order that they are added. TernaryStringSets always return results in sorted order.

TernaryStringSets can contain the empty string, but cannot contain non-strings. Not even null or undefined.

Tree quality

Adding strings in sorted order produces a worst-case tree structure. This can be avoided by adding strings all at once using the constructor or addAll(). Given sorted input, both of these methods will produce an optimal tree structure. If this is not practical, adding strings in random order usually yields a near-optimal tree. Calling balance() will rebuild the tree in optimal form, but it can be expensive.

Similarly, after deleting a large number of strings, future search performance may be improved by calling balance().

Since most TernaryStringSet methods are recursive, extremely unbalanced trees can provoke "maximum call stack size exceeded" errors.

Matching by code point

Some Unicode code points span two characters (char codes) in a JavaScript string. For example, the musical symbol 𝄞, code point U+1D11E, can be assigned to a JavaScript string as follows:

const clefG = "\ud834\udd1e";

Even though it represents a single symbol, the above string has a length of two. To avoid surprises, TernaryStringSet matches by code point, not by char code. For example, since the above string is one code point, it would match getPartialMatchesOf(".") and not getPartialMatchesOf("..").

Compaction

Calling compact() can significantly reduce a set's memory footprint. For large sets of typical strings, typical results are a 50–80% reduction in size. However, no new strings can be added or deleted without undoing the compaction. Compaction is expensive, but can be a one-time or even ahead-of-time step for many use cases.

Serialization

A common use case is to match user input against a fixed set of strings. For example, checking input against a spelling dictionary or suggesting completions for partial input. In such cases it is often desirable to build a set ahead of time, serialize it to a buffer, and then save the buffer data on a server where it can be downloaded as needed. Recreating a set directly from buffer data is much faster than downloading a file containing the strings and inserting them into a new set on the client.

The following steps will make such ahead-of-time sets as small as possible:

  1. Create a set and insert the desired strings using add() or addAll().
  2. Minimize the tree size by calling balance() followed by compact().
  3. Create the buffer with toBuffer() and write the result to a file.
  4. Optionally, compress the result and configure the server to serve the compressed version where supported by the browser.

To recreate the set, download or otherwise obtain the buffer data, then use TernaryTreeSet.fromBuffer(data).

Developing

Building from source requires Node.js. Clone the repository, then install development dependencies:

npm install

The TypeScript source is found under src. Compiled output is written to lib. To build the project:

npm run build

The included tsconfig.json targets ES2015 (ES6). To target old JavaScript engines or browsers you will need to modify this configuration and/or use a tool like Babel.

The project includes an extensive suite of tests under src/tests. To run all tests:

npm test

HTML documentation can be prepared automatically using TypeDoc:

npm run doc

Before submitting a pull request, format, lint, and run all tests:

npm run format && npm run lint && npm test

Keywords

FAQs

Last updated on 17 Oct 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc