What is tdigest?
The tdigest npm package is a JavaScript implementation of Ted Dunning's t-digest algorithm, which is used for accurate online accumulation of rank-based statistics such as quantiles and cumulative distribution functions (CDFs). It is particularly useful for large-scale data processing where memory efficiency and speed are critical.
What are tdigest's main functionalities?
Creating a t-digest
This feature allows you to create a new t-digest instance, which can then be used to accumulate data and compute statistics.
const TDigest = require('tdigest').TDigest;
const t = new TDigest();
Adding data to the t-digest
This feature allows you to add data points to the t-digest. The t-digest will then use these data points to compute rank-based statistics.
t.push(1.0);
t.push(2.0);
t.push(3.0);
Computing quantiles
This feature allows you to compute quantiles from the data accumulated in the t-digest. For example, you can compute the median (50th percentile) or the 90th percentile.
const q50 = t.percentile(0.5);
const q90 = t.percentile(0.9);
Computing cumulative distribution function (CDF)
This feature allows you to compute the cumulative distribution function (CDF) for a given value. The CDF represents the probability that a random variable takes on a value less than or equal to the given value.
const cdf = t.cdf(2.0);
Other packages similar to tdigest
simple-statistics
The simple-statistics package provides a wide range of statistical functions, including quantile calculations. However, it does not implement the t-digest algorithm and may not be as memory efficient for large-scale data processing.
summary-statistics
The summary-statistics package offers basic statistical summaries such as mean, median, and standard deviation. It does not provide the advanced rank-based statistics or memory efficiency of the t-digest algorithm.
descriptive-statistics
The descriptive-statistics package provides functions for descriptive statistics, including quantiles. While it offers similar functionality, it does not use the t-digest algorithm and may not be as efficient for large datasets.
tdigest
Javascript implementation of Dunning's T-Digest for streaming quantile approximation
The T-Digest is a data structure and algorithm for constructing an
approximate distribution for a collection of real numbers presented as a
stream. The algorithm makes no guarantees, but behaves well enough in
practice that implementations have been included in Apache Mahout and
ElasticSearch for computing summaries and approximate order
statistics over a stream.
For an overview of T-Digest's behavior, see Davidson-Pilon's
blog post regarding a python implementation. For more details,
there are the tdigest paper and reference implementation (Java).
This javascript implementation is based on a reading of the paper,
with some boundary and performance tweaks.
Example
var TDigest = require('tdigest').TDigest;
var x=[], N = 100000;
for (var i = 0 ; i < N ; i += 1) {
x.push(Math.random() * 10 - 5);
};
tdigest = new TDigest();
tdigest.digest(x);
console.log(tdigest.summary());
console.log("median ~ "+tdigest.percentile(0.5));
See also example.js in this package.
Dependencies
bintrees
: https://www.npmjs.com/package/bintrees