The Production-Ready AI Agent Checklist (Updated For 2026)
The most useful HN thread this week wasn't a product launch. It was a question:
"Ask HN: What makes an AI agent framework production-ready vs. a toy?"
The answers were more practical than I expected. Not "uses Kubernetes" or "has enterprise support." The community pointed at specific, buildable behaviors. I went through the thread and turned it into a checklist you can run against your OpenClaw setup today — with the specific OpenClaw primitives that implement each item.
The Checklist
1. Observability — You Can See What The Agent Did
Toy agents: You ask "what happened?" and the agent tells you a story.
Production agents: You open a log and see exactly what ran, in what order, with what inputs, and what came back.
In OpenClaw, this means:
# Check your gateway logs
openclaw logs --tail 100
# Check a specific session
openclaw session history <session-key> --limit 50
# Enable verbose logging in your config
openclaw config get logging.level # should be debug or trace
The specific things you should be able to answer from logs alone:
- What model was used for each tool call
- What the tool input was
- What the tool output was
- How long each call took
- What the fallback chain looked like when a model failed
If you can't answer those five questions from your logs, you're running a toy.
2. Graceful Degradation — The Agent Fails Without Destroying Things
Toy agents: One model failure cascades into everything failing.
Production agents: Each failure is contained, logged, and recovered from without losing work.
In OpenClaw, this is the fallback chain:
{"payload":{"fallbacks":["nvidia/qwen3.5-122b-a10b","ollama/qwen3.5:27b-q4_K_M","nvidia/nemotron-nano-12b-v2-vl","ollama/qwen3.5:9b","minimax-portal/MiniMax-M2.7","minimax-portal/MiniMax-M3"]}}
Three cross-provider fallbacks before your primary. When MiniMax is overloaded, the agent doesn't die — it tries Ollama, then Nvidia's endpoint, then another MiniMax model. The work continues.
The circuit breaker pattern: if a tool fails 3 times in a row, stop trying it and tell the user. Add this to your cron job payloads:
{"payload":{"timeoutSeconds":120,"lightContext":true}}
Timeout is the circuit breaker. If a call hasn't returned in 120 seconds, it counts as a failure and the agent moves to the next fallback.
3. Security Surface — Least Privilege On Every Tool
Toy agents: The agent can do anything, including things you didn't intend.
Production agents: Each tool has a explicit permission boundary that the agent cannot exceed.
In OpenClaw, this is the tool_policy in skills. The deny list is the whole point:
name: safe-exec
description: Exec tool with hard limits — no rm -rf, no curl|bash, no cred exfil
system_prompt_addendum: |
You have exec access. You may not:
- Run any command containing 'rm -rf' without explicit user approval
- Run any command containing 'curl | sh' or 'wget | bash'
- Access environment variables containing secrets (OPENAI_KEY, ANTHROPIC_KEY, etc)
- Write to any path outside /home/themachine/.openclaw/workspace/
If a request matches any of these patterns, refuse and explain why.
tool_policy:
allow: [exec, read_file]
deny: [write_file, http_request, browser]
The agent can read and execute, but not write arbitrary files or make outbound HTTP calls. The deny list is the security surface.
4. State Management — Memory Survives Restarts
Toy agents: Every session starts from scratch. The agent has no memory.
Production agents: State persists across sessions, survives restarts, and has explicit recovery logic.
In OpenClaw, this is the 3-level memory system:
memory/YYYY-MM-DD.md → Daily log (raw events, what happened)
MEMORY.md → Curated knowledge (decisions, context, patterns)
~/self-improving/ → Execution memory (what worked, what didn't)
The daily log is the source of truth. MEMORY.md is what survives compaction. The self-improving directory is where patterns compound.
For state that must survive a restart (cron job counters, pending tasks, error states):
{"name":"cron-health-check","payload":{"kind":"agentTurn","message":"Check all cron jobs. If any are in error state for >2 hours, run openclaw cron run --id <jobId>. Write results to logs/cron-health-$(date +%Y%m%d).json"}}
The health state is written to a file, not stored in memory. When the agent restarts, it reads the file and knows where it left off.
5. Operational Tooling — The Agent Can Be Monitored Without Human Watching
Toy agents: You have to watch them to know they're working.
Production agents: They send you a message when something goes wrong.
In OpenClaw, this is the failureAlert on every cron job:
{"failureAlert":{"after":1,"channel":"telegram","to":"749348Tracker","cooldownMs":3600000,"mode":"announce"}}
After 1 failure, Telegram alert. 1-hour cooldown so you're not spammed if the job is retrying. You don't have to watch the agent — it watches itself and tells you when something breaks.
The health check cron runs every 30 minutes:
openclaw cron list --json | python3 -c "
import sys, json
jobs = json.load(sys.stdin)
for job in jobs:
if job.get('consecutiveErrors', 0) >= 2:
print(f'Job {job[\"id\"]} has {job[\"consecutiveErrors\"]} consecutive errors')
"
If any job has 2+ consecutive errors, auto-retrigger it. You don't find out about failures at 9am — you find out within an hour and the job tries to recover automatically.
Running The Checklist
Go through each item:
-
Observability — Run
openclaw logs --tail 20. Can you follow a single request through the log? - Graceful degradation — Kill your primary model provider. Does the agent recover?
- Security surface — Read your most-used skill's tool_policy. Does it have a deny list?
- State management — Restart OpenClaw. Does the agent remember what it was doing?
- Operational tooling — Trigger a failure. Do you get a Telegram alert within an hour?
If you answered no to any of these, that's your next hour of work.
The thread's conclusion was: production-ready agents aren't defined by their models or their benchmarks. They're defined by what happens when something goes wrong. The checklist above is a map of "what goes wrong" for OpenClaw operators — and the specific primitives that handle each case.
Ship the one that's broken first. Then the next. Then you have a production agent.