What is an error budget and how do SRE teams use it?
An error budget is the allowable amount of unreliability in a service, derived from the SLO. If your SLO is 99.9% uptime, your error budget is 0.1% — about 43 minutes of downtime per month.
How teams use it:
- When error budget is healthy → deploy freely, take risks, ship features.
- When error budget is low → slow down deployments, prioritize reliability work.
- When budget is exhausted → freeze all non-critical deployments until reliability improves.
Error budgets create a shared language between product (wants to ship) and SRE (wants reliability). It’s objective, not political.