GROOT 1.0: One Stable Archive for Kubernetes Incidents

go dev.to

I have not published much here lately. Most of my time went into shipping and hardening open-source tools I want operators to trust in production — GROOT, kzero, and the selfhosted/operator repos that wrap them — rather than long-form posts. This article is the catch-up: what changed at v1.0.0, why it matters if you run Kubernetes on call, and how it fits next to tools you may already use. I hope it is useful.

This is my first v1.0.0 on GROOT — a read-only Go CLI that turns a Kubernetes incident into one reproducible .tar.gz: pod logs, events, API snapshots, RCA TSVs, and a manifest you can attach to a ticket or pipe into automation.

If you read the earlier walkthrough, you already know the core idea: collect evidence, do not pretend to diagnose. Version 1.0.0 is the line where I stop treating that promise as “best effort” and document it as a stable contract.

The 1.0 incident loop


Why 1.0.0 is different from “just another tag”

0.9.x was about operator wins: preflight (groot validate), offline archive review (groot inspect), kubectl groot, --summary, stable exit codes, config profiles, and a pile of collector fixes.

1.0.0 is the compatibility boundary:

Contract What it means
config_version: 1 New configs declare schema version; legacy YAML without it still loads
archive_layout_version: 1 Every new archive’s extras/manifest.json carries a layout semver for downstream tools
internal/ layout GROOT is a CLI product, not a public Go SDK — no accidental import paths
--output json collect, validate, and inspect emit structured output for CI and runbooks
Governance CODEOWNERS, issue/PR templates, golden inspect fixtures in CI

Nothing here tries to be a monitoring platform or an AI RCA engine. The scope stays narrow: one command, one bundle, honest metadata.


The incident workflow I actually run

Three commands cover most of my on-call path:

groot validate --config groot.yml
groot collect --config groot.yml --summary
groot inspect ./groot-capture-*.tar.gz
Enter fullscreen mode Exit fullscreen mode

Validate checks config load, API reachability, RBAC (auth can-i for the jobs your config would run), and free disk on output_dir. I run it before the first cron job in a new cluster and after every config change.

Collect still does the parallel read-only work (client-go end-to-end — no kubectl binary at runtime). --summary prints a one-screen footer: job counts, unhealthy pod tallies, archive path, duration.

Inspect reads an existing .tar.gz without cluster access — manifest, file tree, sizes. Post-mortems on a laptop, vendor handoffs, compliance reviews.

Same flow as kubectl groot if you install the plugin binary shipped beside groot in every release tarball.


GROOT vs alternatives (honest picks)

GROOT is not the only way to grab cluster state. Most teams already use something. The question is which job you are hiring a tool for.

Closest cousin: GROOT vs kubectl-gather

kubectl-gather collects similar Kubernetes context. Different output shape, different sweet spot.

Dimension GROOT kubectl-gather
Output Single .tar.gz + manifest.json + RCA TSVs YAML tree per cluster
Use case Ticket-ready bundle (incident, compliance) Multi-cluster diff, manual YAML inspection
Preflight / offline review groot validate, groot inspect (no cluster)
Notifications Slack, Discord, Teams, PagerDuty, Telegram, email, webhooks None
Upload S3, GCS, SFTP None
Supply chain GoReleaser, Homebrew, SBOMs, Cosign Manual build
Plugin kubectl groot (kubectl-groot in tarball) kubectl-gather
Config Versioned YAML (config_version: 1) + env CLI flags
Redaction Optional regex scrub before archive None
Scheduling groot-selfhosted Helm / CronJob Manual

Pick GROOT when you need a signed, self-contained archive for tickets, compliance retention, or object storage — and you want validate/inspect/run_id in the loop.

Pick kubectl-gather when you juggle multiple clusters and diff YAML trees, or you already live in kubectl get … -o yaml workflows.

Both are read-only. Running both on a bad day is fine: .tar.gz for the record, YAML tree for ad-hoc diff.

Complementary tools (different jobs)

Tool Job vs GROOT
k9s / Lens Live cluster navigation Explore first; GROOT freezes state when the incident is hot
Stern / kubectl logs -f Tail logs now GROOT captures historical pod logs into the bundle
kubectl cluster-info dump Built-in dump Flat layout, no manifest/notify/upload; good for quick dumps
OpenShift must-gather Vendor support bundle Platform-specific; GROOT is portable K8s + your config
Popeye / kube-score Config lint / best practices Linter, not forensics — run on live cluster or against collected YAML
Trivy Image vuln scan Scan images in CI; GROOT may include image refs in RCA extras

My stack on call: k9s to poke around → GROOT to attach evidence → grep/jq/groot inspect offline. Not a replacement for Datadog, Loki, or your APM — the first-hour archaeology compress step.

GROOT + kzero: capture first, reset second

GROOT and kzero solve different problems — and I often run them back to back on the same cluster.

kzero is a Go CLI for declarative Kubernetes maintenance: ordered down, up, and reset pipelines from versioned YAML — scale workloads, Helm release steps, PVC/exec hooks, phase scripts, dry-run by default, optional notify when a pipeline stalls. It turns “start over” into a checked-in playbook, not a one-off shell tree.

GROOT is read-only: it freezes pod logs, events, and API context into a .tar.gz before you mutate anything.

kzero reorganizes cluster state — tear down, bring back, or full reset to a known baseline.

Typical maintenance or incident-recovery sequence:

# 1. Freeze what the cluster looked like *before* you touch it
groot validate --config groot.yml
groot collect --config groot.yml --summary --message pre-reset

# 2. Review the bundle (optional but cheap)
groot inspect ./out/groot-capture-*-pre-reset.tar.gz

# 3. Run the kzero playbook (dry-run first in kzero.yaml)
kzero analyze --config kzero.yaml
kzero reset --config kzero.yaml   # down then up; live when run.mode: live
Enter fullscreen mode Exit fullscreen mode

If reset fails halfway, you still have the GROOT archive from step 1 — logs, CrashLoopBackOff counts, and manifest metadata from before the pipeline moved workloads.

Tool Mutates cluster? Output
GROOT No (read-only) .tar.gz evidence bundle
kzero Yes (declarative pipelines) Ordered down/up/reset + logs + notify

Links:

Same product/operator split as GROOT + groot-selfhosted: CLI and SPEC in the main repo; scheduling and runbooks in selfhosted.


What makes the bundle “production grade”

Collector, not analyzer

GROOT deliberately does not emit a verdict like “root cause: OOM”. It captures reproducible state: logs, events, workload context, optional metrics columns in extras/all-pods-rca.tsv, and extras/manifest.json with job counts, paths, run_id, and archive_sha256.

That separation matters. Diagnosis belongs in your observability stack, runbooks, or a human — not in a tool that might guess wrong under pressure.

Security by default

  • Read-only collection — mutating verbs never enter the hot path; extra_kubectl is allowlisted at config load.
  • Optional secret redaction — regex scrub before the archive is written (honest disclaimer: not a cryptographic guarantee).
  • Distroless nonroot container image (ghcr.io/hrodrig/groot).
  • Supply chain — SPDX + CycloneDX SBOMs, Cosign signatures on checksums and GHCR images since 0.6.x.
  • SFTP hardening in 1.0.0known_hosts_file is required when SFTP upload is enabled unless you explicitly opt into insecure host keys for lab use only.

Secrets for notify/upload stay in environment variables (GROOT_*), not committed YAML.

Operations beyond the laptop

The CLI lives in groot. Scheduling and in-cluster patterns live in groot-selfhosted — Helm CronJob, flat manifests, bastion Docker runbooks, airgapped SFTP relay playbooks.

That split keeps the product repo testable while operators pin ghcr.io/hrodrig/groot:1.0.0 (or @latest from GitHub Releases) in their own GitOps.


Structured output for automation

Scripting was always possible via exit codes (0 success, 1 config, 2 API, 3 abort, 4 notify failure). 1.0.0 adds --output json on groot collect so CI and ticketing systems get a stable Summary object on stdout after a successful run.

validate and inspect already supported JSON from 0.9.x — useful in preflight gates and archive QA pipelines.

Example gate in CI:

groot validate --config groot.yml --output json | jq -e '.ok == true'
Enter fullscreen mode Exit fullscreen mode

Starter configs without starting from zero

Ready-made profiles live under examples/profiles/:

Profile When
incident-quick.yml Narrow namespaces, short log window, no upload
bastion-airgap.yml SFTP via SSH relay, minimal notify
eks-managed.yml Skip unsupported node logs, metrics on
compliance-full.yml All namespaces, redaction enabled

Copy, edit cluster names and webhooks, run groot validate, then groot collect.


Migrating to 1.0.0

Existing configs: no forced rewrite. YAML without config_version keeps loading as legacy.

New configs: add at the top:

config_version: 1
Enter fullscreen mode Exit fullscreen mode

New archives include "archive_layout_version": 1 in extras/manifest.json.

SFTP users: if you relied on implicit insecure host keys, set upload.sftp.known_hosts_file (or GROOT_UPLOAD_SFTP_KNOWN_HOSTS_FILE) before upgrading.

Full notes: CHANGELOG 1.0.0.


Honest gaps (post-1.0 roadmap)

Not in 1.0.0 — deliberately deferred to Band 4:

  • Multi-cluster capture in one archive
  • groot analyze / smart log hints
  • Streaming JSONL export
  • In-cluster progress UI

If you need those, open a roadmap discussion — I prefer honest backlog rows over scope creep in a stability release.


Install in one line

curl -fsSL https://get.groot.hermesrodriguez.com/install.sh | sh
groot -v
groot validate --config groot.yml   # after you have a config
Enter fullscreen mode Exit fullscreen mode

Or: GitHub Releases (.deb, .rpm, .tar.gz, .zip), Homebrew (brew install hrodrig/groot/groot), go install github.com/hrodrig/groot/cmd/groot@v1.0.0.

Site: groot.hermesrodriguez.com


Closing

GROOT 1.0.0 is the release where I say: this config shape, this archive layout, and this CLI behavior are the baseline you can build runbooks on.

It is still one tool for one job — compress the first hour of kubectl archaeology into a single attachment — but now with versioned contracts, structured output, and the 0.9.x operator tooling I wished I had on my first production incident.

If it saves you a round of copy-paste on the next page, star the repo or drop an issue with what your team collects that I have not thought of yet.

Clone traffic for this repo (and the rest of the Hermes K8s/DB stack) lives on gghstats — hrodrig/groot — raw GitHub traffic, no vanity smoothing. Full index: gghstats.hermesrodriguez.com.

Thank you for reading — and if you already run GROOT in a cluster somewhere, thank you for trusting it with real incidents. Open source only works when people show up: issues, stars, runbooks, and honest feedback. Let's keep building dependable OSS together — see you in the next post.


Disclosure

Written by Hermes Rodríguez. AI tools helped with drafting and editing; technical claims were checked against SPECIFICATIONS.md, CHANGELOG, and the v1.0.0 tag. The DeepSeek audit notes that sparked this post were a useful checklist — not a substitute for reading the spec.

Always verify behavior on the release you install.

Source: dev.to

arrow_back Back to Tutorials