Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

hetchy

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

hetchy

  • 1.0.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Hetchy

A high performance, thread-safe reservoir sampler for ruby with snapshot and percentile support.

Benefits/Goals

  • Generate statistics for large datasets with a very small in-memory footprint
  • Generate percentiles/quantiles for a known set or numbers or for real-time streaming set
  • Ability to capture sample state at a moment in time for further analysis
  • Speed
  • Thorough test suite
  • No dependencies

Installation

Add this line to your application's Gemfile:

gem 'hetchy'

And then $ bundle

Or install it yourself with:

$ gem install hetchy

Usage

Create a reservoir and designate how big you want it to be:

reservoir = Hetchy::Reservoir.new(size: 1000)

Add samples as they arrive or are generated:

reservoir << 123

You can add sets of samples as well:

reservoir << [45,89,124,96]

You can calculate a percentile at any time:

reservoir.percentile(95)

Hetchy supports high resolution percentiles as well:

reservoir.percentile(99.99)

NOTE: you may need to increase reservoir size for very high resolution percentiles. Experiment to see what works for your data set.

For threaded applications where the reservoir is accepting samples rapidly you can increase performance by snapshotting before running a series of calculations on the reservoir:

dataset  = reservoir.snapshot

perc_95  = dataset.percentile(95)
perc_99  = dataset.percentile(99)
perc_999 = dataset.percentile(99.9)

Clear the reservoir to reset it:

reservoir.clear

Datasets

If you have an existing series you can use Dataset to generate stats for it:

my_series = Array(1..1000)
dataset = Hetchy::Dataset.new(my_series)

perc_95 = dataset.percentile(95)   #=> 950.95
median  = dataset.median           #=> 500.5

Stats Details

For those interested:

  • Reservoir sampling is based on Vitter's algorithm R, ensures a uniform sampling probability for every entry in the series
  • Percentile calculations use weighted averages, not nearest neighbor

Contributing

  • Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
  • Fork the project and submit a pull request from a feature or bugfix branch.
  • Please include tests. This is important so we don't break your changes unintentionally in a future version.
  • Please don't modify the gemspec, Rakefile, version, or changelog. If you do change these files, please isolate a separate commit so we can cherry-pick around it.

Credits

Parts of Hetchy are inspired by Eric Lindvall's metriks gem.

Copyright (c) 2015 Matt Sanders. See LICENSE for details.

FAQs

Package last updated on 27 Aug 2015

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc