
Security News
Deno 2.4 Brings Back deno bundle, Improves Dependency Management and Observability
Deno 2.4 brings back bundling, improves dependency updates and telemetry, and makes the runtime more practical for real-world JavaScript projects.
EchoSwift is a CLI-based tool designed to benchmark LLM (Large Language Model) inference endpoints. It helps evaluate performance using real-world prompts with configurable parameters and visualized results.
The performance metrics captured for varying input and output tokens and parallel users while running the benchmark includes
You can install EchoSwift using pip:
pip install echoswift
Alternatively, you can install from source:
git clone https://github.com/Infobellit-Solutions-Pvt-Ltd/EchoSwift.git
cd EchoSwift
pip install -e .
EchoSwift provides a simple CLI interface for running LLM Inference benchmarks.
Below are the steps to run a sample test, assuming the generation endpoint is active.
config.json
Before running a benchmark, you need to download and filter the dataset:
echoswift dataprep
This command will:
config.json
file in your working directoryEdit the generated config.json
file to match your LLM server configuration. Below is a sample:
{
"_comment": "EchoSwift Configuration",
"out_dir": "Results",
"base_url": "http://localhost:8000/v1/completions",
"tokenizer_path": "/path/to/tokenizer/",
"inference_server": "vLLM",
"model": "/model",
"random_prompt": true,
"max_requests": 1,
"user_counts": [
10
],
"increment_user": [
100
],
"input_tokens": [
32
],
"output_tokens": [
256
]
}
Note: Modify base_url, tokenizer_path, model, and other fields according to your LLM deployment.
EchoSwift supports two input modes depending on your test requirements:
If you want to run the benchmark with a fixed number of input tokens:
"random_prompt": false
input_tokens
and output_tokens
explicitlyIf you prefer using randomized prompts from the dataset:
"random_prompt": true
output_tokens
— EchoSwift will choose random input lengths from the datasetYou can use these configuration with both:
echoswift start
(standard benchmark)echoswift optimaluserrun
(to determine optimal concurrency)optimaluserrun
)To perform optimal user benchmarking:
user_counts
to set the initial number of concurrent usersincrement_user
to define how many users to add per stepExample:
"user_counts": [10],
"increment_user": [100]
In this case, the benchmark will start with 10 users and increase by 100 in each iteration until performance thresholds are hit.
EchoSwift allows two ways to configure the tokenizer used for benchmarking:
Set the TOKENIZER
environment variable to the path of your desired tokenizer.
If TOKENIZER
is not set or is empty, EchoSwift falls back to a built-in default tokenizer:
This ensures the tool remains functional, but the fallback tokenizer may not align with your model's behavior. Use it only for testing or when no tokenizer is specified.
✅ Best Practice: Always specify the correct tokenizer that matches your LLM model for accurate benchmarking results.
Use these combinations as per your requirement to effectively benchmark your LLM endpoint.
Option A: Standard Benchmarking
Use the start command to run a basic benchmark:
echoswift start --config path/to/config.json
Option B: Optimal User Load Benchmarking
To find the optimal number of concurrent users for your LLM endpoint:
echoswift optimaluserrun --config path/to/config.json
Visualize the benchmark results using the built-in plotting tool:
echoswift plot --results-dir path/to/your/results_dir
EchoSwift will create a results
directory (or the directory specified in out_dir
) containing:
After the benchmark completes, you can find CSV files in the output directory. These files contain information about latency, throughput, and TTFT for each test configuration.
If you find our resource useful, please cite our paper:
@inproceedings{Krishna2024,
series = {ICPE '24},
title = {EchoSwift: An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)},
url = {https://dl.acm.org/doi/10.1145/3629527.3652273},
DOI = {10.1145/3629527.3652273},
booktitle = {Companion of the 15th ACM/SPEC International Conference on Performance Engineering},
publisher = {ACM},
author = {Krishna, Karthik and Bandili, Ramana},
year = {2024},
month = May,
collection = {ICPE '24}
}
FAQs
LLM Inference Benchmarking Tool
We found that echoswift demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Deno 2.4 brings back bundling, improves dependency updates and telemetry, and makes the runtime more practical for real-world JavaScript projects.
Security News
CVEForecast.org uses machine learning to project a record-breaking surge in vulnerability disclosures in 2025.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.