node-streamcount
Provides implementations of "sketch" algorithms for real-time counting of
stream data.
For an overview of the type of problems these algorithms solve, read
The Britney Spears Problem
and Wikipedia's article on Streaming algorithm.
The currently implemented algorithms include:
- HyperLogLog
- Count-Min sketch
Download
The source is available for download from
GitHub.
Alternatively, you can install using Node Package Manager (npm):
npm install streamcount
Quick Example
var streamcount = require('streamcount');
var uniques = streamcount.createUniquesCounter(0.01);
uniques.add('user1');
uniques.add('user2');
uniques.add('user3');
uniques.add('user2');
console.log(uniques.count());
var pageCounts = streamcount.createViewsCounter(3);
pageCounts.increment('/');
pageCounts.increment('/');
pageCounts.increment('/product1');
pageCounts.increment('/contact');
pageCounts.increment('/product3');
pageCounts.increment('/');
pageCounts.increment('/about');
pageCounts.increment('/about');
pageCounts.increment('/product2');
pageCounts.increment('/product1');
pageCounts.increment('/');
pageCounts.increment('/product1');
console.dir(pageCounts.getTopK());
streamcount Documentation
### createUniquesCounter
Creates an object for tracking the approximate total number of unique IDs
observed. A common example is estimating the number of unique visitors to
a website. Returns a HyperLogLog object.
Arguments
- stdError - (Optional) A value from (0-1) indicating the acceptable error
rate. This controls the accuracy / memory usage tradeoff. 0.01 is the
default.
### createViewsCounter
Creates an object for tracking estimated top view counts for many unique
IDs. A common example is tracking the most viewed products on a website.
Returns a CountMinSketch object.
Arguments
- topEntryCount - Maximum number of top entries to return view counts for. This
is the maximum size of the array returned by getTopK().
- errFactor - (Optional) The estimated view counts returned by getTopK() can be
off by up to this percentage (0-1). This, combined with failRate, controls
the accuracy / memory usage tradeoff. 0.002 is the default.
- failRate - (Optional) The probability of getting the answer for a query
completely wrong. From (0-1). This, combined with errFactor, controls the
accuracy / memory usage tradeoff. 0.0001 is the default.
### getUniquesObjSize
Returns the serialized size of a uniques counter (HyperLogLog) object in
bytes given a stdError. NOTE: The memory usage will be higher than this
number since we serialize 32-bit integers but JavaScript uses 64-bit numbers.
Arguments
- stdError - Parameter to createUniquesCounter() to estimate storage
requirements for.
### getViewsObjSize
Returns the serialized size of a views counter (CountMinSketch) object in
bytes given an errFactor and failRate. NOTE: This does not include the size
of the serialized MinHeap which includes the size of each unique ID (up to a
max of topEntryCount) plus 5 bytes overhead per entry. NOTE2: The memory
usage will be higher than this number since we serialize 32-bit integers but
JavaScript uses 64-bit numbers.
Arguments
- errFactor - Parameter to createViewsCounter() to estimate storage
requirements for.
- failRate - Parameter to createViewsCounter() to estimate storage requirements
for.
HyperLogLog Documentation
### HyperLogLog
Initializes a HyperLogLog object. Takes the same parameters as
createUniquesCounter.
Example
var HyperLogLog = require('streamcount').HyperLogLog;
var uniques = new HyperLogLog();
add
Add a member to the set.
Arguments
- key - String identifier to add to the set.
count
Count the number of unique members in the set. Returns the estimated
cardinality of the set.
serialize
Serializes this data structure to a binary buffer. Returns a binary Buffer
holding the serialized form of this structure.
HyperLogLog.deserialize
Static method to deserialize a binary buffer into a reconstituted HyperLogLog
structure.
Arguments
- buffer - Binary buffer holding the serialized structure.
- start - Starting offset of the structure in the buffer.
- length - Length of the serialized structure in the buffer.
Example
var uniques = HyperLogLog.deserialize(bufferData);
merge
Merge another HyperLogLog structure of the same size into this one. This makes
it possible to keep a local HyperLogLog object in memory on each webserver, and
periodically serialize->send->deserialize->merge the results into a single
count.
Arguments
- hyperLogLog - The other HyperLogLog object to merge in.
CountMinSketch Documentation
### CountMinSketch
Initializes a CountMinSketch object. Takes the same parameters as
createViewsCounter.
Example
var CountMinSketch = require('streamcount').CountMinSketch;
var topten = new CountMinSketch(10);
increment
Record an observation of the given key.
Arguments
- key - String identifier to increment the observation count for.
getTopK
Returns a sorted list of tuples containing the estimated frequency count
and key for the maxEntries top observed members. Returns an array of length
topEntryCount, containing arrays of length 2 where the first value is the
estimated frequency count and the second value is the given key.
serialize
Serializes this data structure to a binary buffer. Returns a binary Buffer
holding the serialized form of this structure.
CountMinSketch.deserialize
Static method to deserialize a binary buffer into a reconstituted
CountMinSketch structure.
Arguments
- buffer - Binary buffer holding the serialized structure.
- start - Starting offset of the structure in the buffer.
- length - Length of the serialized structure in the buffer.
Example
var pageCounts = CountMinSketch.deserialize(bufferData);