trie-fuzzy
A TS implementation of the Trie data structure, including fuzzy (approximate) string matching.
Powered by DTS.
Features
- Exact string search
- Prefix search
- Fuzzy search
- Generator interfaces - no need to store all results in memory all at once
- Runs in both NodeJS and in browsers
- No dependencies
Usage
import { Trie } from 'trie-fuzzy';
const trie = new Trie([
'armor',
'armadillo',
'armageddon',
'artisan',
'timer',
'time',
'tier',
'dime',
'fiber',
'mime',
'miner',
]);
trie.has('armor');
trie.has('arm');
for (const result of trie.prefixSearch('arm')) {
console.log(result);
}
for (const { key, distance } of trie.fuzzySearch('timer', 2)) {
console.log(key, distance);
}
API
Constructor
constructor(words: string[])
Builds a new Trie
indexing all words in words
.
words
: a list of words that will be inserted in the Trie.
has - exact string matching
has(word: string) => boolean
Verifies if a given word is contained in the Trie's word set.
word
: the word to be queried in the Trie.
Returns: true
if word
exists in the Trie, false
otherwise.
prefixSearch - prefix search
*prefixSearch(prefix: string) => Generator<string>
Searches for all words in the Trie's set with a given prefix.
prefix
: the prefix to be queried.
Returns: A Generator that iterates through all words in the Trie's set that start with prefix
.
fuzzySearch - approximate string matching
*fuzzySearch(word: string, maxDistance: number = 1) => Generator<{ key: string, distance: number }>
Searches for all words in the Trie's set that are similar to a given word. Uses the Damerau-Levenshtein distance (or edit distance with character transposition) to measure similarity.
word
: the word to be queried.
maxDistance
: the threshold that defines the maximal edit distance between word
and the returned results.
Returns: A Generator that iterates through all words in the Trie's set where the edit distance to word
is lower than or equal to maxDistance
. Each result is an object containing two keys: key
holds the word from the Trie set that was matched, and distance
holds the edit distance between the result and word
.
Implementation Details
-
Trie
is an immutable class - after a trie is built, no other words can be added to it nor removed from it. It was designed like this to speed up prefix search and also to leverage the benefits of immutability.
-
Every query operation in Trie
is case sensitive - meaning that a Trie
that contains the word KERFUFFLE
will not return it if the user searches for kerfuffle
(either through has
, prefixSearch
or fuzzySearch
). It was designed like this for the sake of simplicity and to avoid the many edge cases that might arise - it's up to the user to clean up the keys before building a Trie and querying it. The code below is an example of how to perform case-insensitive queries in a Trie:
import { Trie } from 'trie-fuzzy';
const keys = [
'Timer',
'Time',
'Tier',
'Dime',
'Fiber',
'Mime',
'Miner',
];
const cleanKey = (key: string) => key.toUpperCase();
const trie = new Trie(keys.map(cleanKey));
trie.has('timer');
trie.has(cleanKey('timer'));
trie.has(cleanKey('TiMeR'));
Author