A collection templates ported from the SRE Workbook
-
Updated
Aug 24, 2018
A collection templates ported from the SRE Workbook
A simple service level calculator
Calculator to view detection time using error budget consumption rates, based on lessons from Site Reliability Engineering Workbook
Stop burning your error budget. SRE config as code — SLOs, error budgets, runbooks, on-call, and dashboards in one sre.yaml file. Auto-remediates before alerts fire.
端到端 SLO (服务级别目标) 治理平台,集成 MeterSphere 拨测数据,自动计算误差预算,并由 AI 辅助输出结构化诊断报告
Надежность — это не отсутствие сбоев. Это способность системы, команды и человека вместе подняться после падения, переосмыслить, перестроить и идти дальше — с новыми правилами игры, где человеческая уязвимость не угроза, а часть уравнения
Azure-native SLO/SLI engine with error budget tracking, burn-rate alerts, and CLI for Azure Monitor, Application Insights & Log Analytics
Identifies underutilized service-level objectives by detecting SLOs that consistently maintain error budget thresholds,generating actionable Excel reports with historical performance trends and optimization recommendations.
Azure SLO Dashboard with AI-powered error budget explainer — Application Insights + Container Apps + Claude
Error-budget-driven release gating for Prometheus: one SLO spec generates burn-rate alerts, a Grafana dashboard, and a CI/CD gate that freezes deploys when you're out of budget. Go.
SRE on AWS: SLOs/SLAs/error budgets, runbooks, incident response playbooks, chaos engineering, observability stack
Audit API SLO and error-budget docs for targets, SLIs, burn-rate alerts, incidents, exclusions, status reporting, telemetry, owners, and rollout safety.
SLO-Sentinel:an open source, production-ready, high-performance, and OpenSLO-native calculation engine for Service Level Objectives (SLOs) with pluggable data source powered by generative AI.
SLO compliance, error budgets, and multi-window burn rate from Prometheus — markdown reports + CI-friendly check command
Hands-on SRE practice lab for learning golden signals, observability, SLI/SLOs, error budgets, tracing, resilience, and capacity planning.
Data SRE toolkit: SLOs, error budgets, and burn-rate alerts for data pipelines (dbt, Airflow, Snowflake, BigQuery, OpenLineage)
SRE error-budget and burn-rate calculator: SLI, budget consumed, time to exhaustion, and multi-window burn-rate alerting.
Service Level Objective Calculator
SLO tracking, error budget calculation & burn-rate alerting for Kubernetes | Google SRE model | Prometheus | Slack | PagerDuty
API platform info for reliability
Add a description, image, and links to the error-budget topic page so that developers can more easily learn about it.
To associate your repository with the error-budget topic, visit your repo's landing page and select "manage topics."