Open Source CAI Framework Handles Pen Testing Tasks up to 3,...

While the security world has seen a wave of AI hype over the past year, much of it has focused on low-effort tools generating noisy bug bounty submissions or repackaging GPT-4 prompts. We’ve previously covered how this “AI slop” has polluted bug bounty programs, frustrating maintainers and triage teams.

At the same time, serious research has demonstrated that AI agents are growing more capable. In one study we wrote about last year, a team from the University of Illinois Urbana-Champaign showed that LLM-based agents working in teams could exploit real-world zero-day web vulnerabilities with a 53% success rate, at a cost that is rapidly approaching parity with human testers.

Now, researchers at Alias Robotics have released a new open source framework called CAI (Cybersecurity AI) that brings those kinds of agent-based techniques into a reusable toolkit.

The framework is detailed in the paper “CAI: An Open, Bug Bounty-Ready Cybersecurity AI,” submitted to arXiv’s Cryptography and Security category in April 2025. It aims to democratize advanced penetration testing by making it easier to build, compose, and deploy AI agents for real-world security tasks.

A Modular, Human-Centered Approach to AI Security Testing#

CAI treats penetration testing as a collaborative process between specialized AI agents and human operators. The framework breaks down complex security workflows into distinct stages, like reconnaissance, vulnerability discovery, exploitation, and reporting, and assigns each one to a task-specific agent.

These agents aren’t just prompt wrappers. They can:

Run terminal commands through real shell environments
Interact with GUI-based applications using OCR and mouse emulation
Chain tasks across environments to complete full exploit chains

Rather than attempting full end-to-end automation, CAI is designed for human-in-the-loop workflows. Security professionals remain in control, guiding agents, validating results, and making strategic decision, while delegating repetitive or time-consuming tasks to AI. This hybrid model lets humans focus on judgment and creativity, while machines handle scale and speed.

One example walkthrough in the paper shows CAI agents performing reconnaissance, uploading a web shell, cracking credentials, and escalating privileges in a multi-step exploit with each phase handled by a specialized agent with access to real tools.

CAI in the Wild: Outperforming Humans in Speed and Cost#

In Capture the Flag (CTF) competitions, CAI performed competitively with human participants. At the 2023 HackTheBox CTF, it placed first among AI teams and ranked in the global top 20 overall. The paper reports that in certain subtasks like port scanning and fingerprinting, CAI agents were up to 3,600 times faster than unaided humans. Overall, workflows combining CAI and human supervision were 11 times faster on average.

CAI dramatically outperformed humans in categories like crypto (938×), reverse engineering (774×), forensics (741×), and robotics (741×).

The authors also measured cost efficiency. Based on compute costs and execution time, they found that CAI was not only faster than human participants but also significantly cheaper to operate. Across all CTF categories, CAI achieved an average cost ratio of 156× compared to human labor, reinforcing the potential for AI agents to reduce both the time and financial burden of security testing at scale.

CAI was 799× faster on “very easy” tasks and 11× faster on “medium.” On “hard” and “insane” tasks, it underperformed (0.91× and 0.65×), suggesting limits to its generalization or planning depth.

Positioned Between Research and Practice#

The researchers describe CAI as a bridge between academic agent-based AI systems and real-world security engineering. While it is not yet general-purpose or production-ready, it introduces a modular structure that could be extended for custom targets, agent chaining, or vertical domain testing.

Unlike many AI security tools that rely on prompt engineering or large monolithic agents, CAI builds on explicit agent composition, task planning, and human feedback. Its architecture closely mirrors the kind of multi-agent setups explored in recent academic research, including the HPTSA technique that enabled zero-day exploitation in controlled web environments. That earlier research showed that coordinated agents can now outperform open source vulnerability scanners and are closing in on human performance at a fraction of the cost.

Democratizing Access to Bug Bounty–Grade Security Testing#

One of the core goals for CAI is to make advanced security testing accessible to organizations and researchers who lack the budget or tooling of large enterprise teams. The authors describe CAI as providing the foundational components for building “bug bounty–ready” AI agents, systems that can assess real-world systems using modular tools, scripted workflows, and human guidance.

By emphasizing open architecture, flexible agent design, and integration with industry-standard tools, CAI avoids the lock-in of commercial platforms. The framework is positioned as an alternative to proprietary security automation services, offering a way for smaller teams to engage in vulnerability discovery and security assessments without requiring exclusive contracts or closed ecosystems.

The framework also stands out by openly evaluating and demonstrating the offensive capabilities of AI agents in cybersecurity tasks. It provides transparent benchmarks and real-world test cases, in contrast to many evaluations from commercial AI labs that deliberately downplay or omit offensive use cases. These labs often restrict agents or redact results to avoid highlighting the true capabilities of large language models in security contexts, whether for safety reasons or to align with corporate interests. CAI confronts that gap directly.

Agent-Powered Security Is Headed for the Mainstream#

CAI breaks penetration testing into modular tasks, like reconnaissance, exploitation, privilege escalation, and assigns each one to a purpose-built agent. These agents operate across shell, browser, and GUI environments, automating tasks that would typically require multiple tools and significant manual effort.

In testing, CAI agents completed workflows up to 3,600× faster than unaided humans and ran entire attack chains with limited supervision. That level of speed and repeatability has implications well beyond CTFs. It points to a future where agent-powered tooling becomes standard in bug bounty triage, red teaming, and internal security audits.

Based on current trends and adoption rates, the researchers predict that by 2028, AI-powered security testing tools will outnumber human pentesters in mainstream security operations.

Open Source CAI Framework Handles Pen Testing Tasks up to 3,600× Faster Than Humans

CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.

A Modular, Human-Centered Approach to AI Security Testing#

CAI in the Wild: Outperforming Humans in Speed and Cost#

Positioned Between Research and Practice#

Democratizing Access to Bug Bounty–Grade Security Testing#

Agent-Powered Security Is Headed for the Mainstream#

Ready to block malicious and vulnerable dependencies?

Related posts

Open Source CAI Framework Handles Pen Testing Tasks up to 3,600× Faster Than Humans

CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.

A Modular, Human-Centered Approach to AI Security Testing#

CAI in the Wild: Outperforming Humans in Speed and Cost#

Positioned Between Research and Practice#

Democratizing Access to Bug Bounty–Grade Security Testing#

Agent-Powered Security Is Headed for the Mainstream#

Ready to block malicious and vulnerable dependencies?

Related posts

60 Malicious Ruby Gems Used in Targeted Credential Theft Campaign

New CNA Scorecard Tool Ranks CVE Data Quality Across the Ecosystem