CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.
Start with a generalist. Customize for your domain. Deploy faster!
Building a domain-specific enterprise agent from scratch is complex and requires significant effort: agent and tool orchestration, planning logic, safety and alignment policies, evaluation for performance/cost tradeoffs and ongoing improvements. CUGA is a state-of-the-art generalist agent designed with enterprise needs in mind, so you can focus on configuring your domain tools, policies and workflow.
Why CUGA?
🏆 Benchmark Performance
CUGA achieves state-of-the-art performance on leading benchmarks:
🥇 #1 on AppWorld — a benchmark with 750 real-world tasks across 457 APIs
🥈 Top-tier on WebArena (#1 from 02/25 - 09/25) — a complex benchmark for autonomous web agents across application domains
✨ Key Features & Capabilities
High-performing generalist agent — Benchmarked on complex web and API tasks. Combines best-of-breed agentic patterns (e.g. planner-executor, code-act) with structured planning and smart variable management to prevent hallucination and handle complexity
Configurable reasoning modes — Balance performance and cost/latency with flexible modes ranging from fast heuristics to deep planning, optimizing for your specific task requirements
Flexible agent and tool integration — Seamlessly integrate tools via OpenAPI specs, MCP servers, and Langchain, enabling rapid connection to REST APIs, custom protocols, and Python functions
Integrates with Langflow — Low-code visual build experience for designing and deploying agent workflows without extensive coding
Open-source and composable — Built with modularity in mind, CUGA itself can be exposed as a tool to other agents, enabling nested reasoning and multi-agent collaboration. Evolving toward enterprise-grade reliability
Configurable policy and human-in-the-loop instructions(Experimental) — Configure policy-aware instructions and approval gates to improve alignment and ensure safe agent behavior in enterprise contexts
Save-and-reuse capabilities(Experimental) — Capture and reuse successful execution paths (plans, code, and trajectories) for faster and consistent behavior across repeated tasks
get top account by revenue from digital sales then add it to current page
🎯 What you'll see: CUGA will fetch data from the Digital Sales API and then interact with the web page to add the account information directly to the current page - demonstrating seamless API-to-web workflow integration!
Human in the Loop Task Execution
Watch CUGA pause for human approval during critical decision points:
🔧 Optional: Local Digital Sales API Setup (only if remote endpoint fails)
The demo comes pre-configured with the Digital Sales API → 📖 API Docs
Only follow these steps if you encounter issues with the remote Digital Sales endpoint:
# Start the Digital Sales API locally on port 8000
uv run digital_sales_openapi
# Then update ./src/cuga/backend/tools_env/registry/config/mcp_servers.yaml to use localhost:# Change the digital_sales URL from the remote endpoint to:# http://localhost:8000
# In terminal, clone the repository and navigate into it
git clone https://github.com/cuga-project/cuga-agent.git
cd cuga-agent
# 1. Create and activate virtual environment
uv venv --python=3.12 && source .venv/bin/activate
# 2. Install dependencies
uv sync# 3. Set up environment variables# Create .env file with your API keysecho"OPENAI_API_KEY=your-openai-api-key-here" > .env# 4. Start the demo
cuga start demo
# Chrome will open automatically at https://localhost:7860# then try sending your task to CUGA: 'get top account by revenue from digital sales'# 5. View agent trajectories (optional)
cuga viz
# This launches a web-based dashboard for visualizing and analyzing# agent execution trajectories, decision-making, and tool usage
CUGA supports multiple LLM providers with flexible configuration options. You can configure models through TOML files or override specific settings using environment variables.
Supported Platforms
OpenAI - GPT models via OpenAI API (also supports LiteLLM via base URL override)
# OpenAI Configuration
OPENAI_API_KEY=sk-...your-key-here...
AGENT_SETTING_CONFIG="settings.openai.toml"
# Optional overrides
MODEL_NAME=gpt-4o # Override model name
OPENAI_BASE_URL=https://api.openai.com/v1 # Override base URL
OPENAI_API_VERSION=2024-08-06 # Override API version
# WatsonX Configuration
WATSONX_API_KEY=your-watsonx-api-key
WATSONX_PROJECT_ID=your-project-id
WATSONX_URL=https://us-south.ml.cloud.ibm.com # or your region
AGENT_SETTING_CONFIG="settings.watsonx.toml"
# Optional override
MODEL_NAME=meta-llama/llama-4-maverick-17b-128e-instruct-fp8 # Override model for all agents
CUGA supports LiteLLM through the OpenAI configuration by overriding the base URL:
Add to your .env file:
# LiteLLM Configuration (using OpenAI settings)
OPENAI_API_KEY=your-api-key
AGENT_SETTING_CONFIG="settings.openai.toml"
# Override for LiteLLM
MODEL_NAME=Azure/gpt-4o # Override model name
OPENAI_BASE_URL=https://your-litellm-endpoint.com # Override base URL
OPENAI_API_VERSION=2024-08-06 # Override API version
🎯 Task Mode Configuration - Switch between API/Web/Hybrid modes
Available Task Modes
Mode
Description
api
API-only mode - executes API tasks (default)
web
Web-only mode - executes web tasks using browser extension
hybrid
Hybrid mode - executes both API tasks and web tasks using browser extension
How Task Modes Work
API Mode (mode = 'api')
Opens tasks in a regular web browser
Best for API/Tools-focused workflows and testing
Web Mode (mode = 'web')
Interface inside a browser extension (available next to browser)
Optimized for web-specific tasks and interactions
Direct access to web page content and controls
Hybrid Mode (mode = 'hybrid')
Opens inside browser extension like web mode
Can execute both API/Tools tasks and web page tasks simultaneously
Starts from configurable URL defined in demo_mode.start_url
Most versatile mode for complex workflows combining web and API operations
Configuration
Edit ./src/cuga/settings.toml:
[demo_mode]start_url = "https://opensource-demo.orangehrmlive.com/web/index.php/auth/login"# Starting URL for hybrid mode[advanced_features]mode = 'api'# 'api', 'web', or 'hybrid'
📝 Special Instructions Configuration
How It Works
Each .md file contains specialized instructions that are automatically integrated into the CUGA's internal prompts when that component is active. Simply edit the markdown files to customize behavior for each node type.
Available instruction sets:answer, api_planner, code_agent, plan_controller, reflection, shortlister, task_decomposition
go to the cuga webpage and type Identify the common cities between my cuga_workspace/cities.txt and cuga_workspace/company.txt . Here you should see the errors related to CodeAgent. Wait for a minute for tips to be generated. Tips generation can be confirmed from the terminal where cuga start memory was run
Re-run the same utterance again and it should finish in lesser number of steps
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.
We found that cuga demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Socket Firewall Free is now bundled into Docker Hardened Images, adding build-time and dependency-install supply chain protection on top of hardened base images for Node.js, Python, and Rust.
Impostor NuGet package Tracer.Fody.NLog typosquats Tracer.Fody and its author, using homoglyph tricks, and exfiltrates Stratis wallet JSON/passwords to a Russian IP address.