We Had 100 Dead Alerts Firing for Services That No Longer Existed. So I Built a Kubernetes Operator.
go
dev.to
TL;DR: I built and open sourced a Kubernetes operator that manages Grafana Cloud dashboards, alert rules, and SLOs as code — with automatic cleanup when services are decommissioned. It solves the "100 orphaned alerts" problem by coupling Grafana resource lifecycle to Kubernetes resource lifecycle. It was a Tuesday afternoon when someone on the team noticed that Grafana was still sending alerts for a service we'd decommissioned four months ago. Not one alert. Not five. We found over 100 aler