New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →

Book a Demo Sign in

holmesgpt

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

holmesgpt

PyPI

Version: 0.24.3

Maintainers: 2

HolmesGPT — The CNCF SRE Agent

Installation | Docs |

Open-source AI agent for investigating production incidents and finding root causes. Works with any stack — Kubernetes, VMs, cloud providers, databases, and SaaS platforms. We are a Cloud Native Computing Foundation sandbox project. Originally created by Robusta.Dev, with major contributions from Microsoft.

New: Operator Mode — Find Problems 24/7 in the Background

Most AI agents are great at troubleshooting problems, but still need a human to notice something is wrong and trigger an investigation. Operator mode fixes that — HolmesGPT runs in the background 24/7, spots problems before your customers notice, and messages you in Slack with the fix. Connect the GitHub integration and it can even open PRs to fix what it finds.

While the operator itself runs in Kubernetes, health checks can query any data source Holmes is connected to — VMs, cloud services, databases, SaaS platforms, and more.

Deployment verification — Deploy a health check alongside your app to verify the new version is healthy
Scheduled health checks — Continuously monitor services and catch regressions automatically

Features

Petabyte-scale data: Server-side filtering, JSON tree traversal, and tool output transformers keep large payloads out of context windows
Memory-safe execution: Per-tool memory limits, streaming large results to disk, and automatic output budgeting prevent OOM kills when querying large observability datasets
Deep integrations: Prometheus, Grafana, Datadog, Kubernetes, and many more—plus any REST API
Bidirectional alert integrations: Fetch alerts from AlertManager, PagerDuty, OpsGenie, or Jira—and write findings back
Any LLM provider: OpenAI, Anthropic, Azure, Bedrock, Gemini, and more
No Kubernetes required: Works with any infrastructure — VMs, bare metal, cloud services, or containers

How it Works

HolmesGPT uses an agentic loop to query live observability data from multiple sources and identify root causes.

HolmesGPT Investigation Demo

🔗 Data Sources

HolmesGPT integrates with popular observability and cloud platforms. The following data sources ("toolsets") are built-in. Add your own.

Data Source	Notes
AKS	Azure Kubernetes Service cluster and node health diagnostics
ArgoCD	Get status, history and manifests and more of apps, projects and clusters
AWS	RDS events, instances, slow query logs, and more (MCP)
Azure	Azure resources and diagnostics (MCP)
Azure SQL	Database health, performance, connections, and slow queries
Confluence	Private runbooks and documentation
Confluence (MCP)	Private runbooks and documentation (MCP)
Coralogix	Retrieve logs for any resource
Datadog	Query logs, metrics, and traces
Docker	Get images, logs, events, history and more
Elasticsearch / OpenSearch	Query logs, cluster health, shard and index diagnostics
GCP	Google Cloud Platform resources (MCP)
GitHub	Repositories, issues, and pull requests (MCP)
Jenkins (MCP)	Build status, pipeline logs, and job history (MCP)
Grafana	Query and analyze dashboard configurations and panels
Helm	Release status, chart metadata, and values
Internet	Public runbooks, community docs etc
Kafka	Fetch metadata, list consumers and topics or find lagging consumer groups
Kubernetes	Pod logs, K8s events, and resource status (kubectl describe)
Kubernetes Remediation (MCP)	Apply fixes like scaling, rollbacks, and resource edits (MCP)
Loki	Query logs for Kubernetes resources or any query
MariaDB	MariaDB database queries and diagnostics (MCP)
MongoDB	Query data, diagnose performance, inspect schemas, find slow operations
MongoDB Atlas	Cluster health, slow queries, and performance diagnostics
NewRelic	Investigate alerts, query tracing data
OpenShift	Projects, routes, builds, security context constraints, and deployment configs
Prefect (MCP)	Workflow orchestration monitoring, flow runs, and worker health (MCP)
Prometheus	Investigate alerts, query metrics and generate PromQL queries
RabbitMQ	Partitions, memory/disk alerts, troubleshoot split-brain scenarios and more
Robusta	Multi-cluster monitoring, historical change data, runbooks, PromQL graphs and more
ServiceNow	Query tables and incident records
Sentry	Error tracking, issues, and performance monitoring (MCP)
Slab	Team knowledge base and runbooks on demand
Splunk	Log search and analysis (MCP)
SQL Databases	PostgreSQL, MySQL, ClickHouse, MariaDB, SQL Server, SQLite
Tempo	Fetch trace info, debug issues like high latency in application

See the full list of built-in toolsets for additional integrations including Cilium, KubeVela, Notion, and more.

🚀 End-to-End Automation

HolmesGPT can fetch alerts/tickets to investigate from external systems, then write the analysis back to the source or Slack.

Integration	Status	Notes
Slack	✅	Demo. Available via Robusta
Microsoft Teams	✅	Available via Robusta
Prometheus/AlertManager	✅	Robusta or HolmesGPT CLI
PagerDuty	✅	HolmesGPT CLI only
OpsGenie	✅	HolmesGPT CLI only
Jira	✅	HolmesGPT CLI only
GitHub	✅	HolmesGPT CLI only

Installation

Read the installation documentation to learn how to install HolmesGPT.

Supported LLM Providers

Read the LLM Providers documentation to learn how to set up your LLM API key.

Using HolmesGPT

See the walkthrough documentation for usage guides, including:

Interactive mode for asking questions and follow-ups
Investigating Prometheus alerts
CI/CD troubleshooting

🔐 Data Privacy

By design, HolmesGPT has read-only access and respects RBAC permissions. It is safe to run in production environments.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Community

Join our community to discuss the HolmesGPT roadmap and share feedback:

Community Meetups

Support

If you have any questions, feel free to message us on HolmesGPT Slack Channel

How to Contribute

Please read our CONTRIBUTING.md for guidelines and instructions.

For help, contact us on Slack or ask DeepWiki AI your questions.

Please make sure to follow the CNCF code of conduct - details here.

FAQs

What is holmesgpt?

Is holmesgpt well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

holmesgpt

HolmesGPT — The CNCF SRE Agent

New: Operator Mode — Find Problems 24/7 in the Background

Features

How it Works

🔗 Data Sources

🚀 End-to-End Automation

Installation

Supported LLM Providers

Using HolmesGPT

🔐 Data Privacy

License

Community

Support

How to Contribute

Related posts

Feross on the 10 Minutes or Less Podcast: Nobody Reads the Code

108 Chrome Extensions Linked to Data Exfiltration and Session Theft via Shared C2 Infrastructure