vega-statistics
Statistical routines and probability distributions.
API Reference
Random Number Generation
#
vega.random()
<>
Returns a uniform pseudo-random number in the domain [0, 1). By default this
is simply a call to JavaScript's built-in Math.random
function. All Vega
routines that require random numbers should use this function.
#
vega.setRandom(randfunc)
<>
Sets the random number generator to the provided function randfunc.
Subsequent calls to random will invoke the new
function to generate random numbers. Setting a custom generator can be
helpful if one wishes to use an alternative source of randomness or replace
the default generator with a deterministic function for testing purposes.
Distributions
Methods for sampling and calculating probability distributions. Each method
takes a set of distributional parameters and returns a distribution object
representing a random variable.
Distribution objects expose the following methods:
#
vega.randomNormal([mean, stdev])
<>
Creates a distribution object representing a normal (Gaussian) probability
distribution with specified
mean and standard deviation stdev. If unspecified, the mean defaults to 0
and the standard deviation defaults to 1
.
Once created, mean and stdev values can be accessed or modified using
the mean
and stdev
getter/setter methods.
#
vega.randomUniform([min, max])
<>
Creates a distribution object representing a continuous uniform probability
distribution
over the interval [min, max). If unspecified, min defaults to 0
and
max defaults to 1
. If only one argument is provided, it is interpreted as
the max value.
Once created, min and max values can be accessed or modified using
the min
and max
getter/setter methods.
#
vega.randomInteger([min,] max)
<>
Creates a distribution object representing a discrete uniform probability
distribution over
the integer domain [min, max). If only one argument is provided, it is
interpreted as the max value. If unspecified, min defaults to 0
.
Once created, min and max values can be accessed or modified using
the min
and max
getter/setter methods.
#
vega.randomMixture(distributions[, weights])
<>
Creates a distribution object representing a (weighted) mixture of probability
distributions. The distributions argument should be an array of distribution
objects. The optional weights array provides proportional numerical weights
for each distribution. If provided, the values in the weights array will be
normalized to ensure that weights sum to 1. Any unspecified weight values
default to 1
(prior to normalization). Mixture distributions do not
support the icdf
method: calling icdf
will result in an error.
Once created, the distributions and weights arrays can be accessed or
modified using the distributions
and weights
getter/setter methods.
#
vega.randomKDE(values[, bandwidth])
<>
Creates a distribution object representing a
kernel density estimate
for an array of numerical values. This method uses a Gaussian kernel to
estimate a smoothed, continuous probability distribution. The optional
bandwidth parameter determines the width of the Gaussian kernel. If the
bandwidth is either 0
or unspecified, a default bandwidth value will be
automatically estimated based on the input data. KDE distributions do not
support the icdf
method: calling icdf
will result in an error.
Once created, data and bandwidth values can be accessed or modified using
the data
and bandwidth
getter/setter methods.
Statistics
Statistical methods for calculating bins, bootstrapped confidence intervals,
and quartile boundaries.
#
vega.bin(options)
<>
Determine a quantitative binning scheme, for example to create a histogram.
Based on the options provided given, this method will search over a space of
possible bins, aligning step sizes with a given number base and applying
constraints such as the maximum number of allowable bins. Given a set of
options (see below), returns an object describing the binning scheme,
in terms of start
, stop
and step
properties.
The supported options properties are:
- extent: (required) A two-element (
[min, max]
) array indicating the range of desired bin values. - base: The number base to use for automatic bin determination (default base
10
). - maxbins: The maximum number of allowable bins (default
20
). - step: An exact step size to use between bins. If provided, the maxbins and steps options will be ignored.
- steps: An array of allowable step sizes to choose from. If provided, the maxbins option will be ignored.
- minstep: A minimum allowable step size (particularly useful for integer values, default
0
). - divide: An array of scale factors indicating allowable subdivisions. The default value is
[5, 2]
, which indicates that the method may consider dividing bin sizes by 5 and/or 2. For example, for an initial step size of 10, the method can check if bin sizes of 2 (= 10/5), 5 (= 10/2), or 1 (= 10/(5*2)) might also satisfy the given constraints. - nice: Boolean indicating if the start and stop values should be nicely-rounded relative to the step size (default
true
).
vega.bin({extent:[0, 1], maxbins:10});
vega.bin({extent:[0, 1], maxbins:5});
vega.bin({extent:[5, 10], maxbins:5});
#
vega.bootstrapCI(array, samples, alpha[, accessor])
<>
Calculates a bootstrapped
confidence interval for an
input array of values, based on a given number of samples iterations and a
target alpha value. For example, an alpha value of 0.05
corresponds to a
95% confidence interval An optional accessor function can be used to first
extract numerical values from an array of input objects, and is equivalent to
first calling array.map(accessor)
. This method ignores null, undefined and
NaN values.
#
vega.quartiles(array[, accessor])
<>
Given an array of numeric values, returns an array of
quartile boundaries.
The return value is a 3-element array consisting of the first, second (median),
and third quartile boundaries. An optional accessor function can be used to
first extract numerical values from an array of input objects, and is
equivalent to first calling array.map(accessor)
. This method ignores
null, undefined and NaN values.