New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details
Socket
Book a DemoSign in
Socket

holmesgpt

Package Overview
Dependencies
Maintainers
2
Versions
63
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

holmesgpt

pipPyPI
Version
0.24.3
Maintainers
2

HolmesGPT — The CNCF SRE Agent

Installation | Docs | Ask DeepWiki

Open-source AI agent for investigating production incidents and finding root causes. Works with any stack — Kubernetes, VMs, cloud providers, databases, and SaaS platforms. We are a Cloud Native Computing Foundation sandbox project. Originally created by Robusta.Dev, with major contributions from Microsoft.

New: Operator Mode — Find Problems 24/7 in the Background

Most AI agents are great at troubleshooting problems, but still need a human to notice something is wrong and trigger an investigation. Operator mode fixes that — HolmesGPT runs in the background 24/7, spots problems before your customers notice, and messages you in Slack with the fix. Connect the GitHub integration and it can even open PRs to fix what it finds.

While the operator itself runs in Kubernetes, health checks can query any data source Holmes is connected to — VMs, cloud services, databases, SaaS platforms, and more.

  • Deployment verification — Deploy a health check alongside your app to verify the new version is healthy
  • Scheduled health checks — Continuously monitor services and catch regressions automatically

Features

  • Petabyte-scale data: Server-side filtering, JSON tree traversal, and tool output transformers keep large payloads out of context windows
  • Memory-safe execution: Per-tool memory limits, streaming large results to disk, and automatic output budgeting prevent OOM kills when querying large observability datasets
  • Deep integrations: Prometheus, Grafana, Datadog, Kubernetes, and many more—plus any REST API
  • Bidirectional alert integrations: Fetch alerts from AlertManager, PagerDuty, OpsGenie, or Jira—and write findings back
  • Any LLM provider: OpenAI, Anthropic, Azure, Bedrock, Gemini, and more
  • No Kubernetes required: Works with any infrastructure — VMs, bare metal, cloud services, or containers

How it Works

HolmesGPT uses an agentic loop to query live observability data from multiple sources and identify root causes.

holmesgpt-architecture-diagram

HolmesGPT Investigation Demo

🔗 Data Sources

HolmesGPT integrates with popular observability and cloud platforms. The following data sources ("toolsets") are built-in. Add your own.

Data SourceNotes
AKS AKSAzure Kubernetes Service cluster and node health diagnostics
ArgoCD ArgoCDGet status, history and manifests and more of apps, projects and clusters
AWS AWSRDS events, instances, slow query logs, and more (MCP)
Azure AzureAzure resources and diagnostics (MCP)
Azure SQL Azure SQLDatabase health, performance, connections, and slow queries
Confluence ConfluencePrivate runbooks and documentation
Confluence MCP Confluence (MCP)Private runbooks and documentation (MCP)
Coralogix CoralogixRetrieve logs for any resource
Datadog DatadogQuery logs, metrics, and traces
Docker DockerGet images, logs, events, history and more
Elasticsearch Elasticsearch / OpenSearchQuery logs, cluster health, shard and index diagnostics
GCP GCPGoogle Cloud Platform resources (MCP)
GitHub GitHubRepositories, issues, and pull requests (MCP)
Jenkins Jenkins (MCP)Build status, pipeline logs, and job history (MCP)
Grafana GrafanaQuery and analyze dashboard configurations and panels
Helm HelmRelease status, chart metadata, and values
Internet InternetPublic runbooks, community docs etc
Kafka KafkaFetch metadata, list consumers and topics or find lagging consumer groups
Kubernetes KubernetesPod logs, K8s events, and resource status (kubectl describe)
Kubernetes Remediation Kubernetes Remediation (MCP)Apply fixes like scaling, rollbacks, and resource edits (MCP)
Loki LokiQuery logs for Kubernetes resources or any query
MariaDB MariaDBMariaDB database queries and diagnostics (MCP)
MongoDB MongoDBQuery data, diagnose performance, inspect schemas, find slow operations
MongoDB Atlas MongoDB AtlasCluster health, slow queries, and performance diagnostics
NewRelic NewRelicInvestigate alerts, query tracing data
OpenShift OpenShiftProjects, routes, builds, security context constraints, and deployment configs
Prefect Prefect (MCP)Workflow orchestration monitoring, flow runs, and worker health (MCP)
Prometheus PrometheusInvestigate alerts, query metrics and generate PromQL queries
RabbitMQ RabbitMQPartitions, memory/disk alerts, troubleshoot split-brain scenarios and more
Robusta RobustaMulti-cluster monitoring, historical change data, runbooks, PromQL graphs and more
ServiceNow ServiceNowQuery tables and incident records
Sentry SentryError tracking, issues, and performance monitoring (MCP)
Slab SlabTeam knowledge base and runbooks on demand
SplunkLog search and analysis (MCP)
SQL Databases SQL DatabasesPostgreSQL, MySQL, ClickHouse, MariaDB, SQL Server, SQLite
Tempo TempoFetch trace info, debug issues like high latency in application

See the full list of built-in toolsets for additional integrations including Cilium, KubeVela, Notion, and more.

🚀 End-to-End Automation

HolmesGPT can fetch alerts/tickets to investigate from external systems, then write the analysis back to the source or Slack.

IntegrationStatusNotes
SlackDemo. Available via Robusta
Microsoft TeamsAvailable via Robusta
Prometheus/AlertManagerRobusta or HolmesGPT CLI
PagerDutyHolmesGPT CLI only
OpsGenieHolmesGPT CLI only
JiraHolmesGPT CLI only
GitHubHolmesGPT CLI only

Installation

All Installation Methods

Read the installation documentation to learn how to install HolmesGPT.

Supported LLM Providers

All Integration Providers

Read the LLM Providers documentation to learn how to set up your LLM API key.

Using HolmesGPT

See the walkthrough documentation for usage guides, including:

🔐 Data Privacy

By design, HolmesGPT has read-only access and respects RBAC permissions. It is safe to run in production environments.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Community

Join our community to discuss the HolmesGPT roadmap and share feedback:

Support

If you have any questions, feel free to message us on HolmesGPT Slack Channel

How to Contribute

Please read our CONTRIBUTING.md for guidelines and instructions.

For help, contact us on Slack or ask DeepWiki AI your questions.

Please make sure to follow the CNCF code of conduct - details here. Ask DeepWiki

OpenSSF Best Practices OpenSSF Scorecard

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts