math-utils
Math, stats, and miscellaneous type utilities
Descriptive Statistics
org.hammerlab.stats.Stats
has APIs for ingesting numeric elements and outputting nicely formatted statistics about them; modeled after Apache commons-math DescriptiveStatistics
.
As a bonus, it can ingest numbers in histogram-style / run-length-encoded format, supporting Long
values as well for computations involving element counts from RDDs:
scala> import org.hammerlab.stats.Stats
scala> :paste
Stats.fromHist(
List[(Int, Long)](
1 → 10000000000L,
2 → 1000000000,
1 → 100,
2 → 1000000000
)
)
res0: org.hammerlab.stats.Stats[Int,Long] =
num: 12000000100, mean: 1.2, stddev: 0.4, mad: 0
elems: 1×10000000000, 2×1000000000, 1×100, 2×1000000000
sorted: 1×10000000100, 2×2000000000
0.0: 1
0.1: 1
1: 1
5: 1
10: 1
25: 1
50: 1
75: 1
90: 2
95: 2
99: 2
99.9: 2
100.0: 2
HyperGeometric Distribution
org.hammerlab.stats.HypergeometricDistribution
is an implementation of a hypergeometric distribution, modeled after org.apache.commons.math3.distribution.HypergeometricDistribution
, but supporting 8-byte Long
parameters.