Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

bench9000

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

bench9000

0.1
Rubygems

Version published: 9 years ago

Maintainers: 1

Created: 9 years ago

Source

Philosophy

We hope to provide a range of benchmarking tools for Ruby implementers that is suited to a range of different use cases. A common set of benchmarks is coupled with a harness that provides multiple commands.

Questions that we aim to answer include 'have my last few commits affected performance?', 'how much faster is this implementation of Ruby compared to this one?', 'how long does this implementation take to warm up compared to this one?'. We also include functionality for graphing results.

We focus on macro benchmarks (at least a screen-full of code and hopefully much more) and kernels isolated from production code. We also focus primarily on peak temporal performance (how fast it runs when it's warmed up), but we do include measurements of how long warmup takes.

We include common synthetic benchmarks as well as benchmarks taken from unmodified off-the-shelf gems in use in production today.

Another focus is being gentle on early implementations of Ruby. Benchmarks are run in a subprocess with minimal language requirements for the harness.

Detailed notes on the methodology we have used can be found in the final section of this document.

Scoring benchmarks

To get an abstract score that represents performance with higher is better:

bench9000 score 2.2.3 classic

classic-mandelbrot 2.2.3 2000
classic-nbody 2.2.3 2000
...

The score is 1 / time * arbitrary constant, with the arbitrary constant so that JRuby on a typical system achieves around the order of 1000.

Detailed timing results

There are several formats which you can print detailed information in. In all formats the detailed information includes, in order where relevant:

Total warmup time (s)
Mean sample time (s)
Abstract score (no unit)

--value-per-line lists each value on its own line with keys, perhaps for reading by another program:

bench9000 detail 2.2.3 rbx-2.5.8 classic --value-per-line

classic-mandelbrot 2.2.3 warmup 20
classic-mandelbrot 2.2.3 sample 10
classic-mandelbrot 2.2.3 score 2000
classic-mandelbrot rbx-2.5.8 warmup 20
classic-mandelbrot rbx-2.5.8 sample 10
classic-mandelbrot rbx-2.5.8 score 2000
...

--benchmark-per-line lists one benchmark per line, with clusters of values for each implementation and no keys, perhaps for pasting into a spreadsheet:

bench9000 detail 2.2.3 rbx-2.5.8 classic --benchmark-per-line

classic-mandelbrot 20 10 2000 20 10 2000
...

--json outputs all the data in JSON:

bench9000 detail 2.2.3 rbx-2.5.8 classic --json

{
    "benchmarks": [...],
    "implementations": [...],
    "measurements": [
        {
            "benchmark": ...,
            "implementation": ...,
            "warmup_time": ...,
            "sample_mean": ...,
            "sample_error": ...,
            "score": ...,
            "score_error": ...,
            "warmup_samples": [...],
            "samples": [...]
        },
        ...
    ]
}

Comparing two implementations

The performance of two or more implementations can be compared against the first:

bench9000 compare 2.2.3 rbx-2.5.8 jruby-1.7.18 classic

classic-mandelbrot 150% 200%
classic-nbody 150% 200%
...

The percentage shows performance of each implementation beyond the first, relative to the first.

Comparing performance against a reference

Set a reference point with some Ruby:

bench9000 compare-reference 2.2.3 classic

classic-mandelbrot 2000
classic-nbody 2000
...

This writes data to reference.txt.

You can then compare against this reference point multiple times:

bench9000 compare-reference jruby-1.7.18-noindy jruby-1.7.18-indy

classic-mandelbrot 150% 200%
classic-nbody 150% 200%
...

This reads data from reference.txt.

Detailed Reports

To generate a detailed interactive report with charts and tables:

bench9000 report 2.2.3 rbx-2.5.8 classic

The output will go into report.html. You can set --baseline implementation to configure which implementation to use as the default baseline. You can set --notes notes.html to include extra content in the notes tab.

Separating measurement and report generation

All commands accept a --data flag. If this is set measurements will be read from this file and used instead of making new measurements. Existing measurements, plus any new measurements will also be written back to the file.

You can use this with your commands as normal:

bench9000 report 2.2.3 rbx-2.5.8 classic --data data.txt

This allows you to interrupt and resume measurements, and to keep measurements from one report to use in another, such as implementations that have not changed.

The remove command allows measurements to be removed from the file. Here the listed implementations and benchmarks are those to remove, rather than those to measure:

bench9000 remove 2.2.3 --data data.txt

A common use of this is to keep a file with measurements from implementations that are stable, and to remove development implementations to re-measure after changes.

Other options

--show-commands print the implementation commands as they're executed
--show-samples print individual samples as they're measured

If you want to run a benchmark manually, you can use --show-commands to get the Ruby command being run. This command normally expects to be driven by the harness over a pipe to produce more results or to stop. If you want to use the benchmark without the harness you can use the --loop command to just keep running forever, printing iteration times, or the --loopn a b command to run a iterations and b micro-iterations if the benchmark is using the micro harness (just set b to one if you're not using the micro harness).

Configuration

Configuration files specify the available Ruby implementations, benchmarks and benchmark groups. A configuration file named default.config.rbis automatically loaded. It's being looked up in {.,bench,benchmark,benchmarks} directories. You can load additional configuration files for any command using --config file (Using same directories for lookup. Suffix .config.rb can be omitted.).

Implementations

The default configuration includes common Ruby implementations which are expected to be found in rbenv.

1.8.7-p375
1.9.3-p551
2.0.0-p647
2.1.7
2.2.3
2.3.0-dev
jruby-9.0.4.0-int
jruby-9.0.4.0-noindy
jruby-9.0.4.0-indy
rbx-2.5.8-int
rbx-2.5.8
topaz-dev

Development versions of implementations are enabled if you set environment variables to find them. You should have a built version of the implementation in this path, which should be the root directory of the repository.

jruby-dev-(int|noindy|indy) - set JRUBY_DEV_DIR
jruby-dev-truffle-nograal - set JRUBY_DEV_DIR
jruby-dev-truffle-graal - set JRUBY_DEV_DIR and GRAAL_BIN

You can add your own implementations by writing your own config file, or use the custom implementation by setting RUBY_BIN and RUBY_ARGS.

Methodology

Warmup

Before we consider if an implementation is warmed up, we run for at least T0 seconds.

We then consider an implementation to be warmed up when the last N samples have a range relative to the mean of less than E. When a run of N samples have a range less than that, they are considered to be the first N samples where the implementation is warmed up. If the benchmark does not reach this state within W samples or within T1 seconds, we give the user a warning and will start measuring samples anyway. Some benchmarks just don't warm up, or don't warm up in reasonable time. It's fine to ignore this warning in normal usage.

We then take S samples (starting with the last N that passed our warmup test) for our measurements.

If you are going to publish results based on these benchmarks, you should manually verify warmup using lag or auto-correlation plots.

We have chosen T0 to be 30, N to be 20, E to be 0.1, W to be 100, T1 to be 240, and S to be 10. These are arbitrary, but by comparing with the lag plots of our data they do seem to do the right thing.

Error

We currently use the standard deviation as our error.

Rubygem

This tool is also released as a gem for use in other projects. See example usage.

FAQs

What is bench9000?

Is bench9000 well maintained?

Package last updated on 08 Dec 2015

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install