- Follow-up to: GROOT: One archive for cluster diagnostics
- Repository: github.com/hrodrig/groot
- Release: v1.0.0
I have not published much here lately. Most of my time went into shipping and hardening open-source tools I want operators to trust in production — GROOT, kzero, and the selfhosted/operator repos that wrap them — rather than long-form posts. This article is the catch-up: what changed at v1.0.0, why it matters if you run Kubernetes on call, and how it fits next to tools you may already use. I hope it is useful.
This is my first v1.0.0 on GROOT — a read-only Go CLI that turns a Kubernetes incident into one reproducible .tar.gz: pod logs, events, API snapshots, RCA TSVs, and a manifest you can attach to a ticket or pipe into automation.
If you read the earlier walkthrough, you already know the core idea: collect evidence, do not pretend to diagnose. Version 1.0.0 is the line where I stop treating that promise as “best effort” and document it as a stable contract.
The 1.0 incident loop
Why 1.0.0 is different from “just another tag”
0.9.x was about operator wins: preflight (groot validate), offline archive review (groot inspect), kubectl groot, --summary, stable exit codes, config profiles, and a pile of collector fixes.
1.0.0 is the compatibility boundary:
| Contract | What it means |
|---|---|
config_version: 1 |
New configs declare schema version; legacy YAML without it still loads |
archive_layout_version: 1 |
Every new archive’s extras/manifest.json carries a layout semver for downstream tools |
internal/ layout |
GROOT is a CLI product, not a public Go SDK — no accidental import paths |
--output json |
collect, validate, and inspect emit structured output for CI and runbooks |
| Governance | CODEOWNERS, issue/PR templates, golden inspect fixtures in CI |
Nothing here tries to be a monitoring platform or an AI RCA engine. The scope stays narrow: one command, one bundle, honest metadata.
The incident workflow I actually run
Three commands cover most of my on-call path:
groot validate --config groot.yml
groot collect --config groot.yml --summary
groot inspect ./groot-capture-*.tar.gz
Validate checks config load, API reachability, RBAC (auth can-i for the jobs your config would run), and free disk on output_dir. I run it before the first cron job in a new cluster and after every config change.
Collect still does the parallel read-only work (client-go end-to-end — no kubectl binary at runtime). --summary prints a one-screen footer: job counts, unhealthy pod tallies, archive path, duration.
Inspect reads an existing .tar.gz without cluster access — manifest, file tree, sizes. Post-mortems on a laptop, vendor handoffs, compliance reviews.
Same flow as kubectl groot if you install the plugin binary shipped beside groot in every release tarball.
GROOT vs alternatives (honest picks)
GROOT is not the only way to grab cluster state. Most teams already use something. The question is which job you are hiring a tool for.
Closest cousin: GROOT vs kubectl-gather
kubectl-gather collects similar Kubernetes context. Different output shape, different sweet spot.
| Dimension | GROOT | kubectl-gather |
|---|---|---|
| Output | Single .tar.gz + manifest.json + RCA TSVs |
YAML tree per cluster |
| Use case | Ticket-ready bundle (incident, compliance) | Multi-cluster diff, manual YAML inspection |
| Preflight / offline review |
groot validate, groot inspect (no cluster) |
— |
| Notifications | Slack, Discord, Teams, PagerDuty, Telegram, email, webhooks | None |
| Upload | S3, GCS, SFTP | None |
| Supply chain | GoReleaser, Homebrew, SBOMs, Cosign | Manual build |
| Plugin |
kubectl groot (kubectl-groot in tarball) |
kubectl-gather |
| Config | Versioned YAML (config_version: 1) + env |
CLI flags |
| Redaction | Optional regex scrub before archive | None |
| Scheduling | groot-selfhosted Helm / CronJob | Manual |
Pick GROOT when you need a signed, self-contained archive for tickets, compliance retention, or object storage — and you want validate/inspect/run_id in the loop.
Pick kubectl-gather when you juggle multiple clusters and diff YAML trees, or you already live in kubectl get … -o yaml workflows.
Both are read-only. Running both on a bad day is fine: .tar.gz for the record, YAML tree for ad-hoc diff.
Complementary tools (different jobs)
| Tool | Job | vs GROOT |
|---|---|---|
| k9s / Lens | Live cluster navigation | Explore first; GROOT freezes state when the incident is hot |
Stern / kubectl logs -f |
Tail logs now | GROOT captures historical pod logs into the bundle |
kubectl cluster-info dump |
Built-in dump | Flat layout, no manifest/notify/upload; good for quick dumps |
| OpenShift must-gather | Vendor support bundle | Platform-specific; GROOT is portable K8s + your config |
| Popeye / kube-score | Config lint / best practices | Linter, not forensics — run on live cluster or against collected YAML |
| Trivy | Image vuln scan | Scan images in CI; GROOT may include image refs in RCA extras |
My stack on call: k9s to poke around → GROOT to attach evidence → grep/jq/groot inspect offline. Not a replacement for Datadog, Loki, or your APM — the first-hour archaeology compress step.
GROOT + kzero: capture first, reset second
GROOT and kzero solve different problems — and I often run them back to back on the same cluster.
kzero is a Go CLI for declarative Kubernetes maintenance: ordered down, up, and reset pipelines from versioned YAML — scale workloads, Helm release steps, PVC/exec hooks, phase scripts, dry-run by default, optional notify when a pipeline stalls. It turns “start over” into a checked-in playbook, not a one-off shell tree.
GROOT is read-only: it freezes pod logs, events, and API context into a .tar.gz before you mutate anything.
kzero reorganizes cluster state — tear down, bring back, or full reset to a known baseline.
Typical maintenance or incident-recovery sequence:
# 1. Freeze what the cluster looked like *before* you touch it
groot validate --config groot.yml
groot collect --config groot.yml --summary --message pre-reset
# 2. Review the bundle (optional but cheap)
groot inspect ./out/groot-capture-*-pre-reset.tar.gz
# 3. Run the kzero playbook (dry-run first in kzero.yaml)
kzero analyze --config kzero.yaml
kzero reset --config kzero.yaml # down then up; live when run.mode: live
If reset fails halfway, you still have the GROOT archive from step 1 — logs, CrashLoopBackOff counts, and manifest metadata from before the pipeline moved workloads.
| Tool | Mutates cluster? | Output |
|---|---|---|
| GROOT | No (read-only) |
.tar.gz evidence bundle |
| kzero | Yes (declarative pipelines) | Ordered down/up/reset + logs + notify |
Links:
- kzero product: github.com/hrodrig/kzero · kzero.hermesrodriguez.com · Releases
-
Install:
curl -fsSL https://get.kzero.hermesrodriguez.com/install.sh | sh - Operator playbooks: kzero-selfhosted (bastion, cron, full-reset-example)
Same product/operator split as GROOT + groot-selfhosted: CLI and SPEC in the main repo; scheduling and runbooks in selfhosted.
What makes the bundle “production grade”
Collector, not analyzer
GROOT deliberately does not emit a verdict like “root cause: OOM”. It captures reproducible state: logs, events, workload context, optional metrics columns in extras/all-pods-rca.tsv, and extras/manifest.json with job counts, paths, run_id, and archive_sha256.
That separation matters. Diagnosis belongs in your observability stack, runbooks, or a human — not in a tool that might guess wrong under pressure.
Security by default
-
Read-only collection — mutating verbs never enter the hot path;
extra_kubectlis allowlisted at config load. - Optional secret redaction — regex scrub before the archive is written (honest disclaimer: not a cryptographic guarantee).
-
Distroless nonroot container image (
ghcr.io/hrodrig/groot). - Supply chain — SPDX + CycloneDX SBOMs, Cosign signatures on checksums and GHCR images since 0.6.x.
-
SFTP hardening in 1.0.0 —
known_hosts_fileis required when SFTP upload is enabled unless you explicitly opt into insecure host keys for lab use only.
Secrets for notify/upload stay in environment variables (GROOT_*), not committed YAML.
Operations beyond the laptop
The CLI lives in groot. Scheduling and in-cluster patterns live in groot-selfhosted — Helm CronJob, flat manifests, bastion Docker runbooks, airgapped SFTP relay playbooks.
That split keeps the product repo testable while operators pin ghcr.io/hrodrig/groot:1.0.0 (or @latest from GitHub Releases) in their own GitOps.
Structured output for automation
Scripting was always possible via exit codes (0 success, 1 config, 2 API, 3 abort, 4 notify failure). 1.0.0 adds --output json on groot collect so CI and ticketing systems get a stable Summary object on stdout after a successful run.
validate and inspect already supported JSON from 0.9.x — useful in preflight gates and archive QA pipelines.
Example gate in CI:
groot validate --config groot.yml --output json | jq -e '.ok == true'
Starter configs without starting from zero
Ready-made profiles live under examples/profiles/:
| Profile | When |
|---|---|
| incident-quick.yml | Narrow namespaces, short log window, no upload |
| bastion-airgap.yml | SFTP via SSH relay, minimal notify |
| eks-managed.yml | Skip unsupported node logs, metrics on |
| compliance-full.yml | All namespaces, redaction enabled |
Copy, edit cluster names and webhooks, run groot validate, then groot collect.
Migrating to 1.0.0
Existing configs: no forced rewrite. YAML without config_version keeps loading as legacy.
New configs: add at the top:
config_version: 1
New archives include "archive_layout_version": 1 in extras/manifest.json.
SFTP users: if you relied on implicit insecure host keys, set upload.sftp.known_hosts_file (or GROOT_UPLOAD_SFTP_KNOWN_HOSTS_FILE) before upgrading.
Full notes: CHANGELOG 1.0.0.
Honest gaps (post-1.0 roadmap)
Not in 1.0.0 — deliberately deferred to Band 4:
- Multi-cluster capture in one archive
-
groot analyze/ smart log hints - Streaming JSONL export
- In-cluster progress UI
If you need those, open a roadmap discussion — I prefer honest backlog rows over scope creep in a stability release.
Install in one line
curl -fsSL https://get.groot.hermesrodriguez.com/install.sh | sh
groot -v
groot validate --config groot.yml # after you have a config
Or: GitHub Releases (.deb, .rpm, .tar.gz, .zip), Homebrew (brew install hrodrig/groot/groot), go install github.com/hrodrig/groot/cmd/groot@v1.0.0.
Site: groot.hermesrodriguez.com
Closing
GROOT 1.0.0 is the release where I say: this config shape, this archive layout, and this CLI behavior are the baseline you can build runbooks on.
It is still one tool for one job — compress the first hour of kubectl archaeology into a single attachment — but now with versioned contracts, structured output, and the 0.9.x operator tooling I wished I had on my first production incident.
If it saves you a round of copy-paste on the next page, star the repo or drop an issue with what your team collects that I have not thought of yet.
Clone traffic for this repo (and the rest of the Hermes K8s/DB stack) lives on gghstats — hrodrig/groot — raw GitHub traffic, no vanity smoothing. Full index: gghstats.hermesrodriguez.com.
Thank you for reading — and if you already run GROOT in a cluster somewhere, thank you for trusting it with real incidents. Open source only works when people show up: issues, stars, runbooks, and honest feedback. Let's keep building dependable OSS together — see you in the next post.
Disclosure
Written by Hermes Rodríguez. AI tools helped with drafting and editing; technical claims were checked against SPECIFICATIONS.md, CHANGELOG, and the v1.0.0 tag. The DeepSeek audit notes that sparked this post were a useful checklist — not a substitute for reading the spec.
Always verify behavior on the release you install.