
Research
Malicious npm Packages Impersonate Flashbots SDKs, Targeting Ethereum Wallet Credentials
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
faster-count-min-sketch
Advanced tools
A faster Count-Min Sketch implementation for estimating item frequencies in a stream.
A simple, efficient, and fast Count-Min Sketch implementation in JavaScript for estimating item frequencies in data streams. Ideal for scenarios where memory is a concern and exact counts are not strictly necessary. This implementation prioritizes speed and offers a straightforward API.
Count-Min Sketch (CMS) is a probabilistic data structure that provides an efficient way to estimate the frequency of items (cardinality) in a large dataset or a continuous data stream. It's particularly useful when:
Key Trade-off: CMS provides approximate counts. It guarantees that the estimated frequency is never less than the true frequency, but it might be an overestimate. The accuracy of this overestimation can be controlled by parameters epsilon
(error factor) and delta
(probability of error).
epsilon
) and probability of error (delta
).update
and query
.npm install faster-count-min-sketch
import { CountMinSketch } from 'faster-count-min-sketch';
// --- Basic Example ---
// Create with desired error guarantees (recommended)
// Epsilon: error factor (e.g., 0.001 means error is N * 0.001)
// Delta: probability of error exceeding epsilon (e.g., 0.01 means 1% chance)
const sketch = CountMinSketch.createEstimate(0.001, 0.01);
sketch.update("apple", 5); // Increment count of "apple" by 5
sketch.update("banana", 2);
sketch.update("apple", 3); // "apple" is now 8
console.log("Estimated count for 'apple':", sketch.query("apple")); // Expected: around 8
console.log("Estimated count for 'banana':", sketch.query("banana")); // Expected: around 2
console.log("Estimated count for 'orange':", sketch.query("orange")); // Expected: 0 or small number (overestimation)
// Update with default count (1)
sketch.update("grape");
console.log("Estimated count for 'grape':", sketch.query("grape")); // Expected: 1
// --- Merging Sketches ---
const sketchA = CountMinSketch.createEstimate(0.01, 0.01);
sketchA.update("user:123", 10);
sketchA.update("event:click", 5);
const sketchB = CountMinSketch.createEstimate(0.01, 0.01); // Must have same epsilon/delta or resulting width/depth
sketchB.update("user:123", 7);
sketchB.update("event:submit", 12);
sketchA.merge(sketchB);
console.log("Merged 'user:123':", sketchA.query("user:123")); // Expected: 17
console.log("Merged 'event:click':", sketchA.query("event:click")); // Expected: 5
console.log("Merged 'event:submit':", sketchA.query("event:submit")); // Expected: 12
// --- Serialization ---
const jsonData = sketchA.toJSON();
// console.log(JSON.stringify(jsonData)); // You can store or transmit this
const hydratedSketch = CountMinSketch.fromJSON(jsonData);
console.log("Hydrated 'user:123':", hydratedSketch.query("user:123")); // Expected: 17
new CountMinSketch(width, depth)
Creates a new Count-Min sketch instance directly with specified dimensions.
width
(number): The width of the sketch table (number of counters per row). For optimal performance (using bitwise operations for modulo), this value will be automatically adjusted to the next power of 2 if it isn't already.depth
(number): The depth of the sketch table (number of hash functions/rows).Error
if width
or depth
are not positive integers.CountMinSketch.createEstimate(epsilon, delta)
A static factory method to create a Count-Min Sketch with dimensions estimated based on desired error guarantees. This is the recommended way to create a sketch.
epsilon
(number): The desired error factor (0 < epsilon
< 1). The error in estimation for an item's frequency f(x)
is typically within epsilon * N
(where N
is the sum of all frequencies inserted into the sketch). For example, an epsilon
of 0.001
means the error is roughly 0.1% of the total sum of counts.delta
(number): The desired probability of the error exceeding epsilon * N
(0 < delta
< 1). For example, a delta
of 0.01
means there's a 1% chance the actual error is larger than the bound defined by epsilon
.CountMinSketch
instance.Error
if epsilon
or delta
are not within the range (0, 1).width
will be ceil(Math.E / epsilon)
adjusted to the next power of 2, and depth
will be ceil(Math.log(1 / delta))
. The console will log these calculated and adjusted dimensions.update(key, count = 1)
Increments the frequency count for the given key
.
key
(string): The item/key to update.count
(number, default: 1
): The amount to increment the count by. Must be a positive integer. If count <= 0
, the sketch is not modified.query(key)
Returns the estimated frequency count for the given key
.
key
(string): The item/key to query.number
- The estimated frequency. This value is always non-negative. For items not frequently updated, or due to hash collisions, this might be an overestimate. It will never be an underestimate.merge(otherSketch)
Merges another Count-Min Sketch into the current one. This is done by adding the counts from otherSketch.table
to this.table
.
otherSketch
(CountMinSketch): The sketch to merge into the current one.Error
if this.width !== otherSketch.width
or this.depth !== otherSketch.depth
. Both sketches must have identical dimensions for merging to be valid.clear()
Resets all counters in the sketch table to zero.
toJSON()
Serializes the sketch to a JSON-compatible object, allowing it to be stored or transmitted.
object
- An object with the following properties:
width
(number): The width of the sketch.depth
(number): The depth of the sketch.table
(number[]): An array representing the sketch's counter table.CountMinSketch.fromJSON(data)
A static factory method to create a CountMinSketch instance from a previously serialized JSON object (from toJSON()
.
data
(object): The serialized sketch data, typically obtained from toJSON()
. Must contain width
, depth
, and table
properties.CountMinSketch
instance.Error
if the data
object is invalid, missing properties, or if data.table.length
does not match data.width * data.depth
.The accuracy of a Count-Min Sketch is determined by two parameters:
epsilon
(ε): Error Factor
est_f(x)
for an item x
with true frequency true_f(x)
is guaranteed to be:
true_f(x) <= est_f(x) <= true_f(x) + ε * N
N
is the sum of all frequencies (counts) inserted into the sketch (the L1 norm of the frequency vector).epsilon
means a tighter error bound (less overestimation) but requires a larger sketch width (w ≈ e/ε
).delta
(δ): Probability of Error
epsilon
is violated.est_f(x) <= true_f(x) + ε * N
holds with a probability of at least 1 - δ
.delta
means a higher confidence that the error bound is met but requires a larger sketch depth (d ≈ ln(1/δ)
).In simpler terms:
epsilon
to be small for more accurate counts.delta
to be small for higher confidence in that accuracy.epsilon
and delta
lead to a larger, more accurate sketch, which uses more memory.When using CountMinSketch.createEstimate(epsilon, delta)
, these parameters are used to determine the optimal width
and depth
for the sketch to meet your desired error guarantees.
This implementation has been benchmarked against other popular libraries and shows competitive, often superior, performance due to careful optimization of hashing and internal operations.
To run the included benchmarks:
npm run benchmark
The output will show performance for sketch creation, updates, and queries, comparing this library with two prominent ones available on NPM. Specifically Bloom-Filters and Count-Min-Sketch
Example Benchmark Output Snippet (Your Mileage May Vary):
Library | Operation | Ops Performed | Time Taken (ms) | Performance (ops/sec) |
---|---|---|---|---|
faster-count-min-sketch | Update | 1,000,000 | 183 | ~5,464,480.87 |
count-min-sketch | Update | 1,000,000 | 343 | ~2,915,451.90 |
BloomFilters CMS | Update | 1,000,000 | 8821 | ~113,365.83 |
faster-count-min-sketch | Query | 100,000 | 30 | ~3,333,333.33 |
count-min-sketch | Query | 100,000 | 39 | ~2,564,102.56 |
BloomFilters CMS | Query | 100,000 | 886 | ~112,866.82 |
The Count-Min Sketch uses a 2D array (table) of depth
rows and width
columns. Each row is associated with an independent hash function.
(item, count)
: For each of the depth
hash functions, hash item
to get a column index. Increment the counter at table[row][column_index]
by count
.(item)
: For each of the depth
hash functions, hash item
to get a column index. Retrieve the counts from table[row][column_index]
. The minimum of these depth
counts is returned as the estimated frequency.This "minimum of multiple hashes" approach is why the sketch overestimates but never underestimates.
MIT
FAQs
A faster Count-Min Sketch implementation for estimating item frequencies in a stream.
The npm package faster-count-min-sketch receives a total of 12 weekly downloads. As such, faster-count-min-sketch popularity was classified as not popular.
We found that faster-count-min-sketch demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Security News
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
Security News
Following last week’s supply chain attack, Nx published findings on the GitHub Actions exploit and moved npm publishing to Trusted Publishers.