![Maven Central Adds Sigstore Signature Validation](https://cdn.sanity.io/images/cgdhsj6q/production/7da3bc8a946cfb5df15d7fcf49767faedc72b483-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Maven Central Adds Sigstore Signature Validation
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
josephruscio-aggregate
Advanced tools
h1. Aggregate
By Joseph Ruscio
Aggregate is an intuitive ruby implementation of a statistics aggregator including both default and configurable histogram support. It does this without recording/storing any of the actual sample values, making it suitable for tracking statistics across millions/billions of sample without any impact on performance or memory footprint. Originally inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
h2. Getting Started
Aggregates are easy to instantiate, populate with sample data, and then inspect for common aggregate statistics:
#After instantiation use the << operator to add a sample to the aggregate:
stats = Aggregate.new
loop do
# Take some action that generates a sample measurement
stats << sample
end
# The number of samples
stats.count
# The average
stats.mean
# Max sample value
stats.max
# Min sample value
stats.min
# The standard deviation
stats.std_dev
h2. Histograms
Perhaps more importantly than the basic aggregate statistics detailed above Aggregate also maintains a histogram of samples. For anything other than normally distributed data are insufficient at best and often downright misleading 37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms. Aggregates maintains its histogram internally as a set of "buckets". Each bucket represents a range of possible sample values. The set of all buckets represents the range of "normal" sample values.
h3. Binary Histograms
Without any configuration Aggregate instance maintains a binary histogram, where each bucket represents a range twice as large as the preceding bucket i.e. [1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram provides for 128 buckets, theoretically covering the range [1, (2^127) - 1] (See NOTES below for a discussion on the effects in practice of insufficient precision.)
Binary histograms are useful when we have little idea about what the sample distribution may look like as almost any positive value will fall into some bucket. After using binary histograms to determine the coarse-grained characteristics of your sample space you can configure a linear histogram to examine it in closer detail.
h3. Linear Histograms
Linear histograms are specified with the three values low, high, and width. Low and high specify a range [low, high) of values included in the histogram (all others are outliers). Width specifies the number of values represented by each bucket and therefore the number of buckets i.e. granularity of the histogram. The histogram range (high - low) must be a multiple of width:
#Want to track aggregate stats on response times in ms
response_stats = Aggregate.new(0, 2000, 50)
The example above creates a linear histogram that tracks the response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully most of your samples fall in the first couple buckets!
h3. Histogram Outliers
An Aggregate records any samples that fall outside the histogram range as outliers:
# Number of samples that fall below the normal range
stats.outliers_low
# Number of samples that fall above the normal range
stats.outliers_high
h3. Histogram Iterators
Once a histogram is populated Aggregate provides iterator support for examining the contents of buckets. The iterators provide both the number of samples in the bucket, as well as its range:
#Examine every bucket
@stats.each do |bucket, count|
end
#Examine only buckets containing samples
@stats.each_nonzero do |bucket, count|
end
h3. Histogram Bar Chart
Finally Aggregate contains sophisticated pretty-printing support to generate
ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
sample distribution the to_s
method properly sets a marker weight based on the
samples per bucket and aligns all output. Empty buckets are skipped to conserve
screen space.
# Generate and display an 80 column histogram
puts stats.to_s
# Generate and display a 120 column histogram
puts stats.to_s(120)
This code example populates both a binary and linear histogram with the same
set of 65536 values generated by rand
to produce the
two histograms that follow it:
require 'rubygems'
require 'aggregate'
# Create an Aggregate instance
binary_aggregate = Aggregate.new
linear_aggregate = Aggregate.new(0, 65536, 8192)
65536.times do
x = rand(65536)
binary_aggregate << x
linear_aggregate << x
end
puts binary_aggregate.to_s
puts linear_aggregate.to_s
h4. Binary Histogram
value |------------------------------------------------------------------| count
1 | | 3
2 | | 1
4 | | 5
8 | | 9
16 | | 15
32 | | 29
64 | | 62
128 | | 115
256 | | 267
512 |@ | 523
1024 |@ | 970
2048 |@@@ | 1987
4096 |@@@@@@@@ | 4075
8192 |@@@@@@@@@@@@@@@@ | 8108
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 16405
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
~
Total |------------------------------------------------------------------| 65535
h4. Linear (0, 65536, 4096) Histogram
value |------------------------------------------------------------------| count
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4118
12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4059
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 3999
20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4083
24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4134
28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4143
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4152
36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4033
40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4064
45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4012
49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4070
53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4090
57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
Total |------------------------------------------------------------------| 65532
We can see from these histograms that Ruby's rand function does a relatively good job of distributing returned values in the requested range.
h2. NOTES
Ruby doesn't have a log2 function built into Math, so we approximate with log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due to precision limitations, once n reaches a certain size (somewhere > 32) this starts to return n. The larger the value of n, the more numbers i.e. (2^n - 2), (2^n - 3), etc fall trap to this errors. Could probably look into using something like BigDecimal, but for the current purposes of the binary histogram i.e. a simple coarse-grained view the current implementation is sufficient.
FAQs
Unknown package
We found that josephruscio-aggregate demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.