GROOT 1.0: One Stable Archive for Kubernetes Incidents

Follow-up to: GROOT: One archive for cluster diagnostics
Repository: github.com/hrodrig/groot
Release: v1.0.0

I have not published much here lately. Most of my time went into shipping and hardening open-source tools I want operators to trust in production — GROOT, kzero, and the selfhosted/operator repos that wrap them — rather than long-form posts. This article is the catch-up: what changed at v1.0.0, why it matters if you run Kubernetes on call, and how it fits next to tools you may already use. I hope it is useful.

This is my first v1.0.0 on GROOT — a read-only Go CLI that turns a Kubernetes incident into one reproducible .tar.gz: pod logs, events, API snapshots, RCA TSVs, and a manifest you can attach to a ticket or pipe into automation.

If you read the earlier walkthrough, you already know the core idea: collect evidence, do not pretend to diagnose. Version 1.0.0 is the line where I stop treating that promise as “best effort” and document it as a stable contract.

The 1.0 incident loop

Why 1.0.0 is different from “just another tag”

0.9.x was about operator wins: preflight (groot validate), offline archive review (groot inspect), kubectl groot, --summary, stable exit codes, config profiles, and a pile of collector fixes.

1.0.0 is the compatibility boundary:

Contract	What it means
`config_version: 1`	New configs declare schema version; legacy YAML without it still loads
`archive_layout_version: 1`	Every new archive’s `extras/manifest.json` carries a layout semver for downstream tools
`internal/` layout	GROOT is a CLI product, not a public Go SDK — no accidental import paths
`--output json`	`collect`, `validate`, and `inspect` emit structured output for CI and runbooks
Governance	CODEOWNERS, issue/PR templates, golden inspect fixtures in CI

Nothing here tries to be a monitoring platform or an AI RCA engine. The scope stays narrow: one command, one bundle, honest metadata.

The incident workflow I actually run

Three commands cover most of my on-call path:

groot validate --config groot.yml
groot collect --config groot.yml --summary
groot inspect ./groot-capture-*.tar.gz

Validate checks config load, API reachability, RBAC (auth can-i for the jobs your config would run), and free disk on output_dir. I run it before the first cron job in a new cluster and after every config change.

Collect still does the parallel read-only work (client-go end-to-end — no kubectl binary at runtime). --summary prints a one-screen footer: job counts, unhealthy pod tallies, archive path, duration.

Inspect reads an existing .tar.gz without cluster access — manifest, file tree, sizes. Post-mortems on a laptop, vendor handoffs, compliance reviews.

Same flow as kubectl groot if you install the plugin binary shipped beside groot in every release tarball.

GROOT vs alternatives (honest picks)

GROOT is not the only way to grab cluster state. Most teams already use something. The question is which job you are hiring a tool for.

Closest cousin: GROOT vs kubectl-gather

kubectl-gather collects similar Kubernetes context. Different output shape, different sweet spot.

Dimension	GROOT	kubectl-gather
Output	Single `.tar.gz` + `manifest.json` + RCA TSVs	YAML tree per cluster
Use case	Ticket-ready bundle (incident, compliance)	Multi-cluster diff, manual YAML inspection
Preflight / offline review	`groot validate`, `groot inspect` (no cluster)	—
Notifications	Slack, Discord, Teams, PagerDuty, Telegram, email, webhooks	None
Upload	S3, GCS, SFTP	None
Supply chain	GoReleaser, Homebrew, SBOMs, Cosign	Manual build
Plugin	`kubectl groot` (`kubectl-groot` in tarball)	`kubectl-gather`
Config	Versioned YAML (`config_version: 1`) + env	CLI flags
Redaction	Optional regex scrub before archive	None
Scheduling	groot-selfhosted Helm / CronJob	Manual

Pick GROOT when you need a signed, self-contained archive for tickets, compliance retention, or object storage — and you want validate/inspect/run_id in the loop.

Pick kubectl-gather when you juggle multiple clusters and diff YAML trees, or you already live in kubectl get … -o yaml workflows.

Both are read-only. Running both on a bad day is fine: .tar.gz for the record, YAML tree for ad-hoc diff.

Complementary tools (different jobs)

Tool	Job	vs GROOT
k9s / Lens	Live cluster navigation	Explore first; GROOT freezes state when the incident is hot
Stern / `kubectl logs -f`	Tail logs now	GROOT captures historical pod logs into the bundle
`kubectl cluster-info dump`	Built-in dump	Flat layout, no manifest/notify/upload; good for quick dumps
OpenShift must-gather	Vendor support bundle	Platform-specific; GROOT is portable K8s + your config
Popeye / kube-score	Config lint / best practices	Linter, not forensics — run on live cluster or against collected YAML
Trivy	Image vuln scan	Scan images in CI; GROOT may include image refs in RCA extras

My stack on call: k9s to poke around → GROOT to attach evidence → grep/jq/groot inspect offline. Not a replacement for Datadog, Loki, or your APM — the first-hour archaeology compress step.

GROOT + kzero: capture first, reset second

GROOT and kzero solve different problems — and I often run them back to back on the same cluster.

kzero is a Go CLI for declarative Kubernetes maintenance: ordered down, up, and reset pipelines from versioned YAML — scale workloads, Helm release steps, PVC/exec hooks, phase scripts, dry-run by default, optional notify when a pipeline stalls. It turns “start over” into a checked-in playbook, not a one-off shell tree.

GROOT is read-only: it freezes pod logs, events, and API context into a .tar.gz before you mutate anything.

kzero reorganizes cluster state — tear down, bring back, or full reset to a known baseline.

Typical maintenance or incident-recovery sequence:

# 1. Freeze what the cluster looked like *before* you touch it
groot validate --config groot.yml
groot collect --config groot.yml --summary --message pre-reset

# 2. Review the bundle (optional but cheap)
groot inspect ./out/groot-capture-*-pre-reset.tar.gz

# 3. Run the kzero playbook (dry-run first in kzero.yaml)
kzero analyze --config kzero.yaml
kzero reset --config kzero.yaml   # down then up; live when run.mode: live

If reset fails halfway, you still have the GROOT archive from step 1 — logs, CrashLoopBackOff counts, and manifest metadata from before the pipeline moved workloads.

Tool	Mutates cluster?	Output
GROOT	No (read-only)	`.tar.gz` evidence bundle
kzero	Yes (declarative pipelines)	Ordered down/up/reset + logs + notify

Links:

kzero product: github.com/hrodrig/kzero · kzero.hermesrodriguez.com · Releases
Install: curl -fsSL https://get.kzero.hermesrodriguez.com/install.sh | sh
Operator playbooks: kzero-selfhosted (bastion, cron, full-reset-example)

Same product/operator split as GROOT + groot-selfhosted: CLI and SPEC in the main repo; scheduling and runbooks in selfhosted.

What makes the bundle “production grade”

Collector, not analyzer

GROOT deliberately does not emit a verdict like “root cause: OOM”. It captures reproducible state: logs, events, workload context, optional metrics columns in extras/all-pods-rca.tsv, and extras/manifest.json with job counts, paths, run_id, and archive_sha256.

That separation matters. Diagnosis belongs in your observability stack, runbooks, or a human — not in a tool that might guess wrong under pressure.

Security by default

Read-only collection — mutating verbs never enter the hot path; extra_kubectl is allowlisted at config load.
Optional secret redaction — regex scrub before the archive is written (honest disclaimer: not a cryptographic guarantee).
Distroless nonroot container image (ghcr.io/hrodrig/groot).
Supply chain — SPDX + CycloneDX SBOMs, Cosign signatures on checksums and GHCR images since 0.6.x.
SFTP hardening in 1.0.0 — known_hosts_file is required when SFTP upload is enabled unless you explicitly opt into insecure host keys for lab use only.

Secrets for notify/upload stay in environment variables (GROOT_*), not committed YAML.

Operations beyond the laptop

The CLI lives in groot. Scheduling and in-cluster patterns live in groot-selfhosted — Helm CronJob, flat manifests, bastion Docker runbooks, airgapped SFTP relay playbooks.

That split keeps the product repo testable while operators pin ghcr.io/hrodrig/groot:1.0.0 (or @latest from GitHub Releases) in their own GitOps.

Structured output for automation

Scripting was always possible via exit codes (0 success, 1 config, 2 API, 3 abort, 4 notify failure). 1.0.0 adds --output json on groot collect so CI and ticketing systems get a stable Summary object on stdout after a successful run.

validate and inspect already supported JSON from 0.9.x — useful in preflight gates and archive QA pipelines.

Example gate in CI:

groot validate --config groot.yml --output json | jq -e '.ok == true'

Starter configs without starting from zero

Ready-made profiles live under examples/profiles/:

Profile	When
incident-quick.yml	Narrow namespaces, short log window, no upload
bastion-airgap.yml	SFTP via SSH relay, minimal notify
eks-managed.yml	Skip unsupported node logs, metrics on
compliance-full.yml	All namespaces, redaction enabled

Copy, edit cluster names and webhooks, run groot validate, then groot collect.

Migrating to 1.0.0

Existing configs: no forced rewrite. YAML without config_version keeps loading as legacy.

New configs: add at the top:

config_version: 1

New archives include "archive_layout_version": 1 in extras/manifest.json.

SFTP users: if you relied on implicit insecure host keys, set upload.sftp.known_hosts_file (or GROOT_UPLOAD_SFTP_KNOWN_HOSTS_FILE) before upgrading.

Full notes: CHANGELOG 1.0.0.

Honest gaps (post-1.0 roadmap)

Not in 1.0.0 — deliberately deferred to Band 4:

Multi-cluster capture in one archive
groot analyze / smart log hints
Streaming JSONL export
In-cluster progress UI

If you need those, open a roadmap discussion — I prefer honest backlog rows over scope creep in a stability release.

Install in one line

curl -fsSL https://get.groot.hermesrodriguez.com/install.sh | sh
groot -v
groot validate --config groot.yml   # after you have a config

Or: GitHub Releases (.deb, .rpm, .tar.gz, .zip), Homebrew (brew install hrodrig/groot/groot), go install github.com/hrodrig/groot/cmd/groot@v1.0.0.

Site: groot.hermesrodriguez.com

Closing

GROOT 1.0.0 is the release where I say: this config shape, this archive layout, and this CLI behavior are the baseline you can build runbooks on.

It is still one tool for one job — compress the first hour of kubectl archaeology into a single attachment — but now with versioned contracts, structured output, and the 0.9.x operator tooling I wished I had on my first production incident.

If it saves you a round of copy-paste on the next page, star the repo or drop an issue with what your team collects that I have not thought of yet.

Clone traffic for this repo (and the rest of the Hermes K8s/DB stack) lives on gghstats — hrodrig/groot — raw GitHub traffic, no vanity smoothing. Full index: gghstats.hermesrodriguez.com.

Thank you for reading — and if you already run GROOT in a cluster somewhere, thank you for trusting it with real incidents. Open source only works when people show up: issues, stars, runbooks, and honest feedback. Let's keep building dependable OSS together — see you in the next post.

Disclosure

Written by Hermes Rodríguez. AI tools helped with drafting and editing; technical claims were checked against SPECIFICATIONS.md, CHANGELOG, and the v1.0.0 tag. The DeepSeek audit notes that sparked this post were a useful checklist — not a substitute for reading the spec.

Always verify behavior on the release you install.