Security News
pnpm 10.0.0 Blocks Lifecycle Scripts by Default
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
@bunchmark/stats
Advanced tools
This provides the (robust) statistics primitives used in bunchmark.
They all present tradeoffs that favor speed and implementation simplicity, when doing so doesn't impair robustness for our use case. These limitations are systematically documented and could be lifted over time if there is demand.
where type Stats = {q1: number, lowCI: number, median: number, highCI: number, q3: number, MAD: number}
.
median
is the median of the arrayMAD
is the median absolute deviation (see the dedicated section).q1
and q3
are the first and third quartileslowCI
and highCI
are the bounds of the p=.95 confidence interval over the median, taking into account the bonferonni factor (the total number of comparisons to make, i.e. n*(n-1)/2
for n
categories).Important: this is built on top of quickselect()
. It will reorder the content of the array (see quickselect()
for the details). If you want to do a Wilcoxon rank-sum test, it must be done before this step, or on an independant copy of the data set. This is because the Wilcoxon test relies on the order of the arrays to pair items.
i
, left
are integer indices between 0
and array.length - 1
. i
must be larger than or equal to left
.
Reorders array
such that
array[i]
is the value that we would get when doing array.sort((a, b)=> a - b)[i]
array[i]
end up to its rightValues with indices smaller than left
are ignored.
A custom compare
can be be provided. It must return a negative value when a < b
, 0
when a === b
and a positive value otherwise. It is mandatory for non-numeric arrays.
Returns array[i]
.
If several values are needed for a given array, it is strategic to get them from the smallest to the largest index, using the previous index as the left
parameter. This takes advantage of the way quickselect()
reoders its input array.
Suppose we want to get both i0
and i1
where i0
< i1
, we can do
const v0 = quickselect(array, i0)
const v1 = quickselect(array, i1, i0)
When doing the second call, we know that the values at indices smaller than i0
are smaller than array[i0]
. They can thus be ignored, saving us some cycles.
Our implementation is derived from Vladimir Agafonkin's https://github.com/mourner/quickselect with minute modifications.
Like quickselect
but i
, and left
, can be floats. If the aren't integers, this will return a linear interpolation(†) between Math.floor(i)
and ceil(i)
proporional to the decimal part.
array
ends up with array[ceil(i)]
equals to array.sort((a, b) => a-b)[ceil(i)]
(see quickselect()
for how the rest of the array ends up).
Values at indices smaller than ceil(left)
are left untouched (see quickselect
for the details)
†. My implementation is quite naive, there are considerations in that section that are beyond my current understanding.
Where p
is a decimal value between 0
and 1
.
Equivalent to quickselectFloat(array, p * (array.length - 1), left * (array.length - 1))
Equivalent to percentile(array, .5)
.
Returns the median absolute deviation of the array, a robust mesure of variablility that stands in for standard deviation.
It uses quickselectFloat(array, (a, b) => abs(a - median) - abs(b - median))
under the hood and reorders the values in array
accordingly.
We provide the Wilcoxon rank-sum test and the Mann-Whitne U test.
These are bare, limited versions of the corresponding SciPy routines. I've only implemented what I need for bunchmark, if you need more feel free to open an issue, or idealy a PR. Some things are easy to implement (one-tailed tests, optional continuity correction). Exact probability calculations and alternative zero-hanlding procedures for the Wilcoxon test would be a bit more involved.
mwu(A: number[], B: number[]): {U: number, z: number, p:number}
The Mann-Whitney U test
A
and B
don't contain NaNsInspired by the second method found in https://en.wikipedia.org/w/index.php?title=Mann%E2%80%93Whitney_U_test&oldid=1188631305#Calculations and sci-py for the tie correction of the variance computation
wilcoxon(D: number[]): {T: number, z: number, p: number}
A Wilcoxon rank-sum test:
D
doesn't contain NaNsThis is based on https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test#Ties and scipy.stats.wilcoxon
wilcoxon(A: number[], B: number): {T: number, z: number, p: number}
Equivalent to wilcoxon(D)
where D is the pairwise difference between A
and B
.
A
and B
don't contain NaNsHereunder, "variable" is meant in the statistical sense.
Gets the p-value corresponding to a z-value (the probability of finding a value smaller than z by chance in a normally distributied variable).
This is based on a z-table built with scipy.stats.sf()
with 0.001
granularity in the z-domain, for z
between 0
and 9.999
. That level of precision is sufficient for our use case.
To get even more precision, we'd have to approximate the integral from -Infinity
to z
for the normal distribution using the sum of trapeze algorithm (there are no closed-form equations to compute these values).
Equivalent to scipy.stats.cdf(round(z, 3))
in SciPy or pnorm(round(z, 3))
in R.
Find z
such that the probability of finding a value smaller than than it in a normally distributed variable has a probability p
.
Equivalent to scipy.stats.norm.ppf(p)
in SciPy and qnorm(p)
in R.
This is a direct port of the SciPy C++ version.
FAQs
The bunchmark statistical routines.
The npm package @bunchmark/stats receives a total of 24 weekly downloads. As such, @bunchmark/stats popularity was classified as not popular.
We found that @bunchmark/stats demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
Research
Security News
Socket researchers have discovered multiple malicious npm packages targeting Solana private keys, abusing Gmail to exfiltrate the data and drain Solana wallets.