
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
IO Bench is a library designed to benchmark the performance of standard flat file formats and partitioning schemes.
To generate sample data, initialize the IOBench
object with the path to the source CSV file and call the generate_sample
method:
from io_bench import IOBench
bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars', 'parquet_arrow', 'parquet_fast', 'feather', 'feather_arrow'])
bench.generate_sample(records=100000) # default value
NOTE: source_file
behavior is contextual; providing a desired name for a sample file then calling generate_sample
will create the file. Otherwise a valid path to an existing file must be provided.
Convert the generated CSV data to partitioned formats (Avro, Parquet, Feather) will automatically partition on default column selection chunks if not defined.
bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})
NOTE: Partition is stateful per bench object. If partition is not called manually it will automatically be called on the first run only assuming a valid source file exists.
Run benchmarks without column selection:
benchmarks_no_select = bench.run(suffix='_no_select')
Run benchmarks with column selection:
columns = ['Region', 'Country', 'Total Cost']
benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')
Combine results and generate the final report:
all_benchmarks = benchmarks_no_select + benchmarks_column_select
io_bench.report(all_benchmarks, report_dir='./result')
Here is a full example of using IOBench
:
from io_bench import IOBench
def main() -> None:
# Initialize the IOBench object with runs and parsers
bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars'])
# Generate sample data - (optional)
bench.generate_sample()
# Convert the source file to partitioned formats - (optional)
bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})
# Run benchmarks without column selection
benchmarks_no_select = bench.run(suffix='_no_select')
# Run benchmarks with column selection
columns = ['Region', 'Country', 'Total Cost']
benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')
# Combine results and generate the final report
all_benchmarks = benchmarks_no_select + benchmarks_column_select
bench.report(all_benchmarks, report_dir='./result')
if __name__ == "__main__":
main()
FAQs
IO Bench is a library designed to benchmark the performance of standard flat file formats and partitioning schemes.
We found that io-bench demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.