
Security News
npm Adopts OIDC for Trusted Publishing in CI/CD Workflows
npm now supports Trusted Publishing with OIDC, enabling secure package publishing directly from CI/CD workflows without relying on long-lived tokens.
A memory-compact, fully dynamic, flat-packed prefix trie with minimal overhead, supporting custom Unicode ranges
Requires Python 3.10 or newer
Install by running in your terminal
pip install git+https://github.com/VArgenti/PackedTrie.git
trie = PackedTrie(encoding='ascii', size_class='default')
size_class
: Decides the scaling of each node's index ability. Each size_class reserves 1 more byte of memory per node, allowing for an ~256 times larger trie at a cost of ~1 extra byte per node. An overfilled trie will throw an OverflowError.
"tiny", "default", "large", "massive", "massive+", "massive++"
encoding
: Decides the unicode range each node supports. The wider the range the more memory is reserved per node
Encoding predefined options
Category | langs |
---|---|
General | bmp , unicode |
ASCII & subsets | ascii , printable_ascii , digits , punctuation , alphanumeric |
Symbols | symbols_math , currency_symbols , arrows , misc_technical |
Latin script | latin_alphabet , latin_alphabet_lowercase , latin_alphabet_uppercase , latin , latin_extended_a , latin_extended_b |
Other scripts | greek , cyrillic , hebrew , arabic , devanagari , thai , japanese , korean , hangul_compat , cjk |
Special formats | emojis_basic , emojis_extended , fullwidth_forms |
Method | Average | Worst Case |
---|---|---|
rebuild() | O(n*m) | – |
insert() , remove() | O(k+m) | O(m*k) |
with_prefix() | O(m+w) | O(m*log k + w) |
__contains__() , has_prefix() | O(log k + m) | O(m*log k) |
__iter__() | O(m) per yielded string | – |
__sizeof__() | O(log k) | – |
clear() , is_empty() , __len__() , node_count() , memory_efficiency() , __repr__() , allowed_chars() , help() | O(1) | – |
Notation
n
: number of nodesm
: length of the relevant stringk
: number of unique characters used in the triew
: number of strings yieldedtrie = PackedTrie(encoding='ascii', size_class='default')
trie.insert("foo")
"foo" in trie # True
trie.with_prefix("fo") # ["foo"]
A trie is a way to store strings that allows for fast lookup and efficient prefix matching. Each node in the trie is a character from a string and its children represent possible continuations of that string. For example, a trie with the string "FOO" will look as such;
Here each edge represents a character, and the final "O" node has a flag to indicate it is the end of a string. Here’s a trie with multiple strings;
This trie contains the strings "BAR", "FOG", "FOO", "FOOL" and "FOOT"
The most common way to make a trie is to have each node be a dict, where each entry representes a character and has a pointer to said character's node. This is an easy and fast way to build a trie but comes with a memory overhead of potentially hundred of bytes per node.
This trie can cram each node into as few as 4 bytes per node while still allowing the trie to store a dictionary of (conservatively) ~3 000 000 real words using only the latin alphabet. Approximately 18 times the amount of words in the Oxford English Dictionary. You can also customise the trie to make each node 10 bytes. At 10 bytes per node (massive++ size), no current data center has enough RAM to overflow the trie's indexable space.
This trie is flat packed, meaning that no parents have pointers to thir children. Each node in this trie is packaged as; its own character, an end-of-word flag, and an index to its children. Each node is kept in a chunk within an array, where all children are grouped together. Through this we can index the first child and do binary search to find the position of the specific child we want to see. For example, a trie with the strings "A", "B", "C" and "D" will look as such;
The first box in each node is its character, the second the end of word flag and the third with an index to where its children are. Here "B" is a terminal node.
For easier indexing, searching and faster operations, we also have a system to preallocated certain sized chunks. We have 'tiers' of bytearrays which we fit all nodes in. In each bytearray, a chunk of children is given a preallocated space appropiate for 2**tier nodes, starting at tier 0. Through this, at trie containing the strings "ABC", "AC", "B" and "C" will look like;
The 3rd slot in each node has an arrow pointing to the first of its children. Note that this is an index, not a pointer.
Despite the trie being designed to be compact in memory through flat-packing, no other memory optimisations have been made- that is to say this it not a Patricia trie, Radix trie or a DAWG. It's simply a very compact plain trie.
For usage help, run:
PackedTrie.help()
It will print available encodings and methods.
FAQs
Flat-packed memory-efficient prefix trie
We found that packedtrie demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
npm now supports Trusted Publishing with OIDC, enabling secure package publishing directly from CI/CD workflows without relying on long-lived tokens.
Research
/Security News
A RubyGems malware campaign used 60 malicious packages posing as automation tools to steal credentials from social media and marketing tool users.
Security News
The CNA Scorecard ranks CVE issuers by data completeness, revealing major gaps in patch info and software identifiers across thousands of vulnerabilities.