What is tdigest?
The tdigest npm package is a JavaScript implementation of Ted Dunning's t-digest algorithm, which is used for accurate online accumulation of rank-based statistics such as quantiles and cumulative distribution functions (CDFs). It is particularly useful for large-scale data processing where memory efficiency and speed are critical.
What are tdigest's main functionalities?
Creating a t-digest
This feature allows you to create a new t-digest instance, which can then be used to accumulate data and compute statistics.
const TDigest = require('tdigest').TDigest;
const t = new TDigest();
Adding data to the t-digest
This feature allows you to add data points to the t-digest. The t-digest will then use these data points to compute rank-based statistics.
t.push(1.0);
t.push(2.0);
t.push(3.0);
Computing quantiles
This feature allows you to compute quantiles from the data accumulated in the t-digest. For example, you can compute the median (50th percentile) or the 90th percentile.
const q50 = t.percentile(0.5);
const q90 = t.percentile(0.9);
Computing cumulative distribution function (CDF)
This feature allows you to compute the cumulative distribution function (CDF) for a given value. The CDF represents the probability that a random variable takes on a value less than or equal to the given value.
const cdf = t.cdf(2.0);
Other packages similar to tdigest
simple-statistics
The simple-statistics package provides a wide range of statistical functions, including quantile calculations. However, it does not implement the t-digest algorithm and may not be as memory efficient for large-scale data processing.
summary-statistics
The summary-statistics package offers basic statistical summaries such as mean, median, and standard deviation. It does not provide the advanced rank-based statistics or memory efficiency of the t-digest algorithm.
descriptive-statistics
The descriptive-statistics package provides functions for descriptive statistics, including quantiles. While it offers similar functionality, it does not use the t-digest algorithm and may not be as efficient for large datasets.
tdigest
Javascript implementation of Dunning's T-Digest for streaming quantile approximation
The T-Digest is a data structure and algorithm for constructing an
approximate distribution for a collection of real numbers presented as a
stream. The algorithm makes no guarantees, but behaves well enough in
practice that implementations have been included in Apache Mahout and
ElasticSearch for computing summaries and approximate order
statistics over a stream.
For an overview of T-Digest's behavior, see Davidson-Pilon's
blog post regarding a python implementation. For more details,
there are the tdigest paper and reference implementation (Java).
This javascript implementation is based on a reading of the paper,
with some boundary and performance tweaks.
###changes since 0.0.4:
API Overhaul:
- asArray() -> toArray()
- redigest() -> compress()
- digest() -> push()
- pushing an array no longer triggers compression
bugfixes and speed improvements.
A grunt dist
task has been added to create a UMD-wrapped version of tdigest
and dependencies for importing as a standalone module in client-side javascript.
quickstart
node.js:
npm install tdigest
var TDigest = require('tdigest').TDigest;
var x=[], N = 100000;
for (var i = 0 ; i < N ; i += 1) {
x.push(Math.random() * 10 - 5);
};
td = new TDigest();
td.push(x);
td.compress(x);
console.log(td.summary());
console.log("median ~ "+td.percentile(0.5));
See also example.js in this package.
In the browser:
The grunt dist
task has been configured to generate
a self-contained UMD-wrapped version of tdigest in dist/tdigest.js.
Embed it in HTML like this:
<script src="dist/tdigest.js"></script>
<script>
var td = new this.tdigest.TDigest();
for (var i=0; i < 1000000; i++) {
td.push(Math.random());
}
td.compress();
document.write(td.summary())
</script>
See also example.html in this package.
dependencies
bintrees
: https://www.npmjs.com/package/bintrees