Yuan Alerting Specification
Alerts in Yuan follows the specification of prometheus operator runbook.
Each Alerting Rule is relevant to a specific runbook page, consisting of the following sections:
- Meaning
- Impact
- Diagnosis
- Mitigation
the annotations of alerting rules are defined as follows:
annotations:
runbook: <runbook page url>
summary: <alert summary>
description: <alert description>
and with the following labels:
labels:
severity: <unknown | info | warning | error | critical>
it is worth noting that the difference between summary and description is that summary is a short description of what the alert is about, while description has more details of what's happening right now, usually with labels detailing which specific time series is firing the alert.
Alert Notification
alerts are sent from prometheus alertmanager to an app alert-receiver via webhook.
as alertmanager's documentation states, the messages posted from alertmanager are complied with the following schema:
{
"version": "4",
"groupKey": <string>,
"truncatedAlerts": <int>,
"status": "<resolved|firing>",
"receiver": <string>,
"groupLabels": <object>,
"commonLabels": <object>,
"commonAnnotations": <object>,
"externalURL": <string>,
"alerts": [
{
"status": "<resolved|firing>",
"labels": <object>,
"annotations": <object>,
"startsAt": "<rfc3339>",
"endsAt": "<rfc3339>",
"generatorURL": <string>,
"fingerprint": <string>
},
...
]
}
note that the alerts field is an array of alerts, which means that alertmanager sends multiple alerts in one message grouped by the keys defined in its configuration.
here we suggest to use alertname as the group key, which means that alerts with the same alertname will be grouped together in one message to avoid message flooding.