How do you write effective Prometheus alerting rules?
Effective Prometheus alerts follow these principles:
groups:
- name: api-alerts
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 5m # Must be true for 5 minutes before firing
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.service }}"
description: "Error rate is {{ $value | humanizePercentage }}"
runbook: "https://wiki.internal/runbooks/high-error-rate"
Key practices: Use for to avoid alerting on momentary spikes. Always include a runbook link. Use human-readable messages with $labels and $value.