
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
agentunit
Advanced tools
AgentUnit is a framework for evaluating, monitoring, and benchmarking multi-agent systems. It standardises how teams define scenarios, run experiments, and report outcomes across adapters, model providers, and deployment targets.
AgentUnit requires Python 3.10 or later. The recommended workflow uses Poetry for dependency management.
git clone https://github.com/aviralgarg05/agentunit.git
cd agentunit
poetry install
poetry shell
To use pip instead:
python -m venv .venv
source .venv/bin/activate
pip install -e .
Optional integrations are published as extras; install only what you need:
poetry install --with promptflow,crewai,langgraph
# or with pip
pip install agentunit[promptflow,crewai,langgraph]
| Extra | Includes | Use Case |
|---|---|---|
promptflow | promptflow>=1.0.0 | Azure PromptFlow integration |
crewai | crewai>=0.201.1 | CrewAI multi-agent orchestration |
langgraph | langgraph>=1.0.0a4 | LangGraph state machines |
openai | openai>=1.0.0 | OpenAI models and Swarm |
anthropic | anthropic>=0.18.0 | Claude/Bedrock integration |
phidata | phidata>=2.0.0 | Phidata agents |
all | All above extras | Complete installation |
Refer to the adapters guide for per-adapter requirements and feature support matrices.
Create a file example_suite.py:
from agentunit import Scenario, DatasetCase, Runner
from agentunit.adapters import MockAdapter
from agentunit.metrics import ExactMatch
# Define test cases
cases = [
DatasetCase(
id="math_1",
query="What is 2 + 2?",
expected_output="4"
),
DatasetCase(
id="capital_1",
query="What is the capital of France?",
expected_output="Paris"
)
]
# Create scenario
scenario = Scenario(
name="Basic Q&A Test",
adapter=MockAdapter(), # Replace with your adapter
dataset=cases,
metrics=[ExactMatch()]
)
# Run evaluation
runner = Runner()
results = runner.run(scenario)
# Print results
print(f"Success rate: {results.success_rate:.1%}")
print(f"Average latency: {results.avg_latency:.2f}s")
Run it:
python example_suite.py
Create example_suite.yaml:
name: "Customer Support Q&A"
description: "Evaluate customer support agent responses"
adapter:
type: "openai"
config:
model: "gpt-4"
temperature: 0.7
max_tokens: 500
dataset:
cases:
- input: "How do I reset my password?"
expected: "Use the 'Forgot Password' link on the login page"
metadata:
category: "account"
- input: "What are your business hours?"
expected: "Monday-Friday 9AM-5PM EST"
metadata:
category: "general"
metrics:
- "exact_match"
- "semantic_similarity"
- "latency"
timeout: 30
retries: 2
Run it with the CLI:
agentunit example_suite.yaml \
--json results.json \
--markdown results.md \
--junit results.xml
AgentUnit exposes an agentunit CLI entry point once installed. Typical usage:
agentunit path.to.suite \
--metrics faithfulness answer_correctness \
--json reports/results.json \
--markdown reports/results.md \
--junit reports/results.xml
Programmatic runners are available through agentunit.core.Runner for notebook- or script-driven workflows.
| Topic | Reference |
|---|---|
| Quick evaluation walkthrough | Quickstart |
| Scenario and adapter authoring | docs/writing-scenarios.md |
| Adapter implementations guide | docs/adapters.md |
| Metrics catalog and reference | docs/metrics-catalog.md |
| CLI options and examples | docs/cli.md |
| Architecture overview | docs/architecture.md |
| Framework-specific guides | docs/platform-guides.md |
| No-code builder guide | docs/nocode-quickstart.md |
| OpenTelemetry integration | docs/telemetry.md |
| Performance testing | docs/performance-testing.md |
| Comparison to other tools | docs/comparison.md |
| Templates | docs/templates/ |
Use the table above as the canonical navigation surface; every document cross-links back to related topics for clarity.
poetry run python3 -m pytest tests -v
Latest verification (2025-10-24): 144 passed, 10 skipped, 32 warnings. Warnings originate from third-party dependencies (langchain pydantic shim deprecations and datetime.utcnow usage). Track upstream fixes or pin patched releases as needed.
We welcome contributions! Please see CONTRIBUTING.md for:
Security disclosures and sensitive topics should follow responsible disclosure guidelines outlined in SECURITY.md.
AgentUnit is released under the MIT License. See LICENSE for the full text.
Need an overview for stakeholders? Start with docs/architecture.md. Ready to extend the platform? Explore the templates under docs/templates/.
FAQs
A framework for evaluating, monitoring, and benchmarking multi-agent systems
We found that agentunit demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.