Trie (Prefix tree)
This library is compatible with Go 1.11+
Please refer to CHANGELOG.md
if you encounter breaking changes.
Motivation
The goal of this project is to provide serverless prefix tree friendly implementation.
where one function can easily building tree and publishing to some cloud storge.
Then the second load trie to perform various operations.
Introduction
A trie (prefix tree) is a space-optimized tree data structure in which each node that is merged with its parent.
Unlike regular trees (where whole keys are from their beginning up to the point of inequality), the key at each node is compared chunk by chunk,
Prefix tree has the following application:
Character comparision complexity:
- Brute Force: O(d n k)
- Prefix Trie: O(d log(k))
Where
- d: number of characters in document
- n: number of keywords
- k: average keyword length
Usage
trie := ptrie.New()
for key, value := range pairs {
if err = trie.Put(key, value); err != nil {
log.Fatal(err)
}
}
has := trie.Has(key)
value, has := trie.Get(key)
matched := trie.MatchAll(input, func(key []byte, value interface{}) bool {
fmt.Printf("matched: key: %s, value %v\n", key, value)
return true
})
- Building
trie := ptrie.New()
for key, value := range pairs {
if err = trie.Put(key, value); err != nil {
log.Fatal(err)
}
}
writer := new(bytes.Buffer)
if err := trie.Encode(writer); err != nil {
log.Fatal(err)
}
encoded := write.Bytes()
- Loading
var v *V
trie := ptrie.New()
trie.UseType(reflect.TypeOf(v))
if err := trie.Decode(reader); err != nil {
log.Fatal(err)
}
- Traversing (range map)
trie.Walk(func(key []byte, value interface{}) bool {
fmt.Printf("key: %s, value %v\n", key, value)
return true
})
- Lookup
has := trie.Has(key)
value, has := trie.Get(key)
- MatchPrefix
var input []byte
...
matched := trie.MatchPrefix(input, func(key []byte, value interface{}) bool {
fmt.Printf("matched: key: %s, value %v\n", key, value)
return true
})
- MatchAll
var input []byte
...
matched := trie.MatchAll(input, func(key []byte, value interface{}) bool {
fmt.Printf("matched: key: %s, value %v\n", key, value)
return true
})
Benchmark
The benchmark count all words that are part of the following extracts:
Lorem Ipsum
- Short: avg line size: 20, words: 13
- Long: avg line size: 711, words: 551
Benchmark_LoremBruteForceShort-8 500000 3646 ns/op
Benchmark_LoremTrieShort-8 500000 2376 ns/op
Benchmark_LoremBruteForceLong-8 1000 1612877 ns/op
Benchmark_LoremTrieLong-8 10000 119990 ns/op
Hamlet
- Short: avg line size: 20, words: 49
- Long: avg line size: 41, words: 105
Benchmark_HamletBruteForceShort-8 30000 44306 ns/op
Benchmark_HamletTrieShort-8 100000 18530 ns/op
Benchmark_HamletBruteForceLong-8 10000 226836 ns/op
Benchmark_HamletTrieLong-8 50000 39329 ns/op
Code coverage
License
The source code is made available under the terms of the Apache License, Version 2, as stated in the file LICENSE
.
Individual files may be made available under their own specific license,
all compatible with Apache License, Version 2. Please see individual files for details.
Credits and Acknowledgements
Library Author: Adrian Witas