
Security News
New React Server Components Vulnerabilities: DoS and Source Code Exposure
New DoS and source code exposure bugs in React Server Components and Next.js: whatβs affected and how to update safely.
cuga
Advanced tools
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.
Building a domain-specific enterprise agent from scratch is complex and requires significant effort: agent and tool orchestration, planning logic, safety and alignment policies, evaluation for performance/cost tradeoffs and ongoing improvements. CUGA is a state-of-the-art generalist agent designed with enterprise needs in mind, so you can focus on configuring your domain tools, policies and workflow.
CUGA achieves state-of-the-art performance on leading benchmarks:
High-performing generalist agent β Benchmarked on complex web and API tasks. Combines best-of-breed agentic patterns (e.g. planner-executor, code-act) with structured planning and smart variable management to prevent hallucination and handle complexity
Configurable reasoning modes β Balance performance and cost/latency with flexible modes ranging from fast heuristics to deep planning, optimizing for your specific task requirements
Flexible agent and tool integration β Seamlessly integrate tools via OpenAPI specs, MCP servers, and Langchain, enabling rapid connection to REST APIs, custom protocols, and Python functions
Integrates with Langflow β Low-code visual build experience for designing and deploying agent workflows without extensive coding
Open-source and composable β Built with modularity in mind, CUGA itself can be exposed as a tool to other agents, enabling nested reasoning and multi-agent collaboration. Evolving toward enterprise-grade reliability
Configurable policy and human-in-the-loop instructions (Experimental) β Configure policy-aware instructions and approval gates to improve alignment and ensure safe agent behavior in enterprise contexts
Save-and-reuse capabilities (Experimental) β Capture and reuse successful execution paths (plans, code, and trajectories) for faster and consistent behavior across repeated tasks
Explore the Roadmap to see what's ahead, or join the π€ Call for the Community to get involved.
Watch CUGA seamlessly combine web and API operations in a single workflow:
Example Task: get top account by revenue from digital sales, then add it to current page
https://github.com/user-attachments/assets/0cef8264-8d50-46d9-871a-ab3cefe1dde5
Experience CUGA's hybrid capabilities by combining API calls with web interactions:
Switch to hybrid mode:
# Edit ./src/cuga/settings.toml and change:
mode = 'hybrid' # under [advanced_features] section
Install browser API support:
playwright installer should already be included after installing with Quick Startplaywright install chromium
Start the demo:
cuga start demo
Enable the browser extension:
Open the test application:
Try the hybrid task:
get top account by revenue from digital sales then add it to current page
π― What you'll see: CUGA will fetch data from the Digital Sales API and then interact with the web page to add the account information directly to the current page - demonstrating seamless API-to-web workflow integration!
Watch CUGA pause for human approval during critical decision points:
Example Task: get best accounts
https://github.com/user-attachments/assets/d103c299-3280-495a-ba66-373e72554e78
Experience CUGA's Human-in-the-Loop capabilities where the agent pauses for human approval at key decision points:
Enable HITL mode:
# Edit ./src/cuga/settings.toml and ensure:
api_planner_hitl = true # under [advanced_features] section
Start the demo:
cuga start demo
Try the HITL task:
get best accounts
π― What you'll see: CUGA will pause at critical decision points, showing you the planned actions and waiting for your approval before proceeding.
The demo comes pre-configured with the Digital Sales API β π API Docs
Only follow these steps if you encounter issues with the remote Digital Sales endpoint:
# Start the Digital Sales API locally on port 8000
uv run digital_sales_openapi
# Then update ./src/cuga/backend/tools_env/registry/config/mcp_servers.yaml to use localhost:
# Change the digital_sales URL from the remote endpoint to:
# http://localhost:8000
# In terminal, clone the repository and navigate into it
git clone https://github.com/cuga-project/cuga-agent.git
cd cuga-agent
# 1. Create and activate virtual environment
uv venv --python=3.12 && source .venv/bin/activate
# 2. Install dependencies
uv sync
# 3. Set up environment variables
# Create .env file with your API keys
echo "OPENAI_API_KEY=your-openai-api-key-here" > .env
# 4. Start the demo
cuga start demo
# Chrome will open automatically at https://localhost:7860
# then try sending your task to CUGA: 'get top account by revenue from digital sales'
# 5. View agent trajectories (optional)
cuga viz
# This launches a web-based dashboard for visualizing and analyzing
# agent execution trajectories, decision-making, and tool usage
Refer to: .env.example for detailed examples.
CUGA supports multiple LLM providers with flexible configuration options. You can configure models through TOML files or override specific settings using environment variables.
Setup Instructions:
.env file:
# OpenAI Configuration
OPENAI_API_KEY=sk-...your-key-here...
AGENT_SETTING_CONFIG="settings.openai.toml"
# Optional overrides
MODEL_NAME=gpt-4o # Override model name
OPENAI_BASE_URL=https://api.openai.com/v1 # Override base URL
OPENAI_API_VERSION=2024-08-06 # Override API version
Default Values:
gpt-4oSetup Instructions:
Access IBM WatsonX
Create a project and get your credentials:
Add to your .env file:
# WatsonX Configuration
WATSONX_API_KEY=your-watsonx-api-key
WATSONX_PROJECT_ID=your-project-id
WATSONX_URL=https://us-south.ml.cloud.ibm.com # or your region
AGENT_SETTING_CONFIG="settings.watsonx.toml"
# Optional override
MODEL_NAME=meta-llama/llama-4-maverick-17b-128e-instruct-fp8 # Override model for all agents
Default Values:
meta-llama/llama-4-maverick-17b-128e-instruct-fp8Setup Instructions:
.env file:
AGENT_SETTING_CONFIG="settings.azure.toml" # Default config uses ETE
AZURE_OPENAI_API_KEY="<your azure apikey>"
AZURE_OPENAI_ENDPOINT="<your azure endpoint>"
OPENAI_API_VERSION="2024-08-01-preview"
CUGA supports LiteLLM through the OpenAI configuration by overriding the base URL:
Add to your .env file:
# LiteLLM Configuration (using OpenAI settings)
OPENAI_API_KEY=your-api-key
AGENT_SETTING_CONFIG="settings.openai.toml"
# Override for LiteLLM
MODEL_NAME=Azure/gpt-4o # Override model name
OPENAI_BASE_URL=https://your-litellm-endpoint.com # Override base URL
OPENAI_API_VERSION=2024-08-06 # Override API version
Setup Instructions:
.env file:
# OpenRouter Configuration
OPENROUTER_API_KEY=your-openrouter-api-key
AGENT_SETTING_CONFIG="settings.openrouter.toml"
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
# Optional override
MODEL_NAME=openai/gpt-4o # Override model name
CUGA uses TOML configuration files located in src/cuga/configurations/models/:
settings.openai.toml - OpenAI configuration (also supports LiteLLM via base URL override)settings.watsonx.toml - WatsonX configurationsettings.azure.toml - Azure OpenAI configurationsettings.openrouter.toml - OpenRouter configurationEach file contains agent-specific model settings that can be overridden by environment variables.
π‘ Tip: Want to use your own tools or add your MCP tools? Check out src/cuga/backend/tools_env/registry/config/mcp_servers.yaml for examples of how to configure custom tools and APIs, including those for digital sales.
Cuga supports isolated code execution using Docker/Podman containers for enhanced security.
Install container runtime: Download and install Rancher Desktop or Docker.
Install sandbox dependencies:
uv sync --group sandbox
Start with remote sandbox enabled:
cuga start demo --sandbox
This automatically configures Cuga to use Docker/Podman for code execution instead of local execution.
Test your sandbox setup (optional):
# Test local sandbox (default)
cuga test-sandbox
# Test remote sandbox with Docker/Podman
cuga test-sandbox --remote
You should see the output: ('test succeeded\n', {})
Note: Without the --sandbox flag, Cuga uses local Python execution (default), which is faster but provides less isolation.
./src/cuga| Mode | File | Description |
|---|---|---|
fast | ./configurations/modes/fast.toml | Optimized for speed |
balanced | ./configurations/modes/balanced.toml | Balance between speed and precision (default) |
accurate | ./configurations/modes/accurate.toml | Optimized for precision |
custom | ./configurations/modes/custom.toml | User-defined settings |
configurations/
βββ modes/fast.toml
βββ modes/balanced.toml
βββ modes/accurate.toml
βββ modes/custom.toml
Edit settings.toml:
[features]
cuga_mode = "fast" # or "balanced" or "accurate" or "custom"
Documentation: ./docs/flags.html
| Mode | Description |
|---|---|
api | API-only mode - executes API tasks (default) |
web | Web-only mode - executes web tasks using browser extension |
hybrid | Hybrid mode - executes both API tasks and web tasks using browser extension |
mode = 'api')mode = 'web')mode = 'hybrid')demo_mode.start_urlEdit ./src/cuga/settings.toml:
[demo_mode]
start_url = "https://opensource-demo.orangehrmlive.com/web/index.php/auth/login" # Starting URL for hybrid mode
[advanced_features]
mode = 'api' # 'api', 'web', or 'hybrid'
Each .md file contains specialized instructions that are automatically integrated into the CUGA's internal prompts when that component is active. Simply edit the markdown files to customize behavior for each node type.
Available instruction sets: answer, api_planner, code_agent, plan_controller, reflection, shortlister, task_decomposition
configurations/
βββ instructions/
βββ instructions.toml
βββ default/
β βββ answer.md
β βββ api_planner.md
β βββ code_agent.md
β βββ plan_controller.md
β βββ reflection.md
β βββ shortlister.md
β βββ task_decomposition.md
βββ [other instruction sets]/
Edit configurations/instructions/instructions.toml:
[instructions]
instruction_set = "default" # or any instruction set above
uv sync --group memoryenable_memory = true in setting.tomlcuga start memoryWatch CUGA with Memory enabled
[LINK]
Would you like to test this? (Advanced Demo)
enable_memory flag to truecuga start memorycuga start demo_crm --sample-memory-dataIdentify the common cities between my cuga_workspace/cities.txt and cuga_workspace/company.txt . Here you should see the errors related to CodeAgent. Wait for a minute for tips to be generated. Tips generation can be confirmed from the terminal where cuga start memory was runβ’ Change ./src/cuga/settings.toml: cuga_mode = "save_reuse_fast"
β’ Run: cuga start demo
β’ First run: get top account by revenue
get top 2 accounts by revenueβ’ Flow now will be saved:
β’ Verify reuse: get top 4 accounts by revenue
CUGA supports three types of tool integrations. Each approach has its own use cases and benefits:
| Tool Type | Best For | Configuration | Runtime Loading |
|---|---|---|---|
| OpenAPI | REST APIs, existing services | mcp_servers.yaml | β Build |
| MCP | Custom protocols, complex integrations | mcp_servers.yaml | β Build |
| LangChain | Python functions, rapid prototyping | Direct import | β Runtime |
The test suite covers various execution modes across different scenarios:
| Scenario | Fast Mode | Balanced Mode | Accurate Mode | Save & Reuse Mode |
|---|---|---|---|---|
| Find VP Sales High-Value Accounts | β | β | β | - |
| Get top account by revenue | β | β | β | β |
| List my accounts | β | β | β | - |
Unit Tests
Integration Tests
Focused suites:
./src/scripts/run_tests.sh
For information on how to evaluate, see the CUGA Evaluation Documentation
CUGA is open source because we believe trustworthy enterprise agents must be built together.
Here's how you can help:
All contributions are welcome through GitHub Issues - whether it's sharing use cases, requesting features, or reporting bugs!
Amongst other, we're exploring the following directions:
Please follow the contribution guide in CONTRIBUTING.md.
FAQs
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.
We found that cuga demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
New DoS and source code exposure bugs in React Server Components and Next.js: whatβs affected and how to update safely.

Security News
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.

Security News
GitHub has revoked npm classic tokens for publishing; maintainers must migrate, but OpenJS warns OIDC trusted publishing still has risky gaps for critical projects.