🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Book a Demo Install Sign in

io-bench

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

io-bench

IO Bench is a library designed to benchmark the performance of standard flat file formats and partitioning schemes.

0.1.0

PyPI

Maintainers: 1

IOBench Quick Start Guide

Generating Sample Data

To generate sample data, initialize the IOBench object with the path to the source CSV file and call the generate_sample method:

from io_bench import IOBench

bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars', 'parquet_arrow', 'parquet_fast', 'feather', 'feather_arrow'])
bench.generate_sample(records=100000) # default value

NOTE: source_file behavior is contextual; providing a desired name for a sample file then calling generate_sample will create the file. Otherwise a valid path to an existing file must be provided.

Converting Data to Partitioned Formats

Convert the generated CSV data to partitioned formats (Avro, Parquet, Feather) will automatically partition on default column selection chunks if not defined.

bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})

Running Benchmarks

NOTE: Partition is stateful per bench object. If partition is not called manually it will automatically be called on the first run only assuming a valid source file exists.

Without Column Selection

Run benchmarks without column selection:

benchmarks_no_select = bench.run(suffix='_no_select')

With Column Selection

Run benchmarks with column selection:

columns = ['Region', 'Country', 'Total Cost']
benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')

Generating Reports

Combine results and generate the final report:

all_benchmarks = benchmarks_no_select + benchmarks_column_select
io_bench.report(all_benchmarks, report_dir='./result')

Full Example

Here is a full example of using IOBench:

from io_bench import IOBench

def main() -> None:
    # Initialize the IOBench object with runs and parsers
    bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars'])

    # Generate sample data - (optional)
    bench.generate_sample()

    # Convert the source file to partitioned formats - (optional)
    bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})

    # Run benchmarks without column selection
    benchmarks_no_select = bench.run(suffix='_no_select')

    # Run benchmarks with column selection
    columns = ['Region', 'Country', 'Total Cost']
    benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')

    # Combine results and generate the final report
    all_benchmarks = benchmarks_no_select + benchmarks_column_select
    bench.report(all_benchmarks, report_dir='./result')

if __name__ == "__main__":
    main()

Keywords

FAQs

What is io-bench?

Is io-bench well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

io-bench

IOBench Quick Start Guide

Generating Sample Data

Converting Data to Partitioned Formats

Running Benchmarks

Without Column Selection

With Column Selection

Generating Reports

Full Example

Keywords

Related posts

8 More Malicious Firefox Extensions: Exploiting Popular Game Recognition, Hijacking User Sessions, and Stealing OAuth Credentials

Official Go SDK for MCP in Development, Stable Release Expected in August