
Security News
Follow-up and Clarification on Recent Malicious Ruby Gems Campaign
A clarification on our recent research investigating 60 malicious Ruby gems.
A high performance native ruby extension (written in C++) for computation of descriptive statistics.
This gem provides fast computation of descriptive statistics (min, max, mean, median, 1st and 3rd quartiles, population standard deviation) for a multivariate dataset (represented as a 2D array) in ruby.
It is ~11x faster than an optimal algorithm in hand-written ruby, and ~4.7x faster than the next fastest available ruby gem or native extension (see benchmarks below).
Add this line to your application's Gemfile:
gem 'fast_statistics'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install fast_statistics
Given you have some multivariate (2-dimensional) data:
data = [
[0.6269, 0.3783, 0.1477, 0.2374],
[0.4209, 0.1055, 0.8000, 0.2023],
[0.1124, 0.1021, 0.1936, 0.8566],
[0.6454, 0.5362, 0.4567, 0.8309],
[0.4828, 0.1572, 0.5706, 0.4085],
[0.5594, 0.0979, 0.4078, 0.5885],
[0.8659, 0.5346, 0.5566, 0.6166],
[0.7256, 0.5841, 0.8546, 0.3918]
]
You can compute descriptive statistics for all the inner arrays as follows:
require "fast_statistics"
FastStatistics::Array2D.new(data).descriptive_statistics
# Result:
#
# [{:min=>0.1477,
# :max=>0.6269,
# :mean=>0.347575,
# :median=>0.30785,
# :q1=>0.214975,
# :q3=>0.44045,
# :standard_deviation=>0.18100761551658537},
# {:min=>0.1055,
# :max=>0.8,
# :mean=>0.38217500000000004,
# :median=>0.3116,
# :q1=>0.1781,
# :q3=>0.515675,
# :standard_deviation=>0.26691825878909076},
# ...,
# {:min=>0.3918,
# :max=>0.8546,
# :mean=>0.639025,
# :median=>0.6548499999999999,
# :q1=>0.536025,
# :q3=>0.75785,
# :standard_deviation=>0.1718318709523935}]
Some alternatives compared are:
You can reivew the benchmark implementations at benchmark/benchmark.rb
and run the
benchmark with rake benchmark
.
Results:
Comparing calculated statistics with 10 values for 8 variables...
Test passed, results are equal to 6 decimal places!
Benchmarking with 100,000 values for 12 variables...
Warming up --------------------------------------
descriptive_statistics 1.000 i/100ms
Custom ruby 1.000 i/100ms
narray 1.000 i/100ms
ruby_native_statistics 1.000 i/100ms
FastStatistics 3.000 i/100ms
Calculating -------------------------------------
descriptive_statistics 0.473 (± 0.0%) i/s - 3.000 in 6.354555s
Custom ruby 2.518 (± 0.0%) i/s - 13.000 in 5.169084s
narray 4.231 (± 0.0%) i/s - 22.000 in 5.210299s
ruby_native_statistics 5.962 (± 0.0%) i/s - 30.000 in 5.041869s
FastStatistics 28.417 (±10.6%) i/s - 141.000 in 5.012229s
Comparison:
FastStatistics: 28.4 i/s
ruby_native_statistics: 6.0 i/s - 4.77x (± 0.00) slower
narray: 4.2 i/s - 6.72x (± 0.00) slower
Custom ruby: 2.5 i/s - 11.29x (± 0.00) slower
descriptive_statistics: 0.5 i/s - 60.09x (± 0.00) slower
The inspiration for this gem was a use-case in an analytics ruby application, where we frequently had to compute descriptive statistics for fairly large multivariate datasets. Calculations in ruby were not fast enough, so I first explored performing the computations natively in this repository. The results were promising, so I decided to package it as a ruby gem.
Note: This is an early release and should be considered unstable, at least until I'm confident in the stability & performance in a real world application setting. Feel free to test it out in non-critical scenarios/environments (let me know in this discussion thread or by filing an issue if you use it!). I'm also not really an expert in C++, so reviews & suggestions are welcome.
The following factors combined help this gem achieve high performance compared to available native alternatives and hand-written computations in ruby:
The speed gains notwithstanding, there are some limitations in the current implementation:
nil
awareness is present).Bug reports and pull requests are welcome on GitHub at https://github.com/Martin-Nyaga/fast_statistics.
The gem is available as open source under the terms of the MIT License.
FAQs
Unknown package
We found that fast_statistics demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
A clarification on our recent research investigating 60 malicious Ruby gems.
Security News
ESLint now supports parallel linting with a new --concurrency flag, delivering major speed gains and closing a 10-year-old feature request.
Research
/Security News
A malicious Go module posing as an SSH brute forcer exfiltrates stolen credentials to a Telegram bot controlled by a Russian-speaking threat actor.