A minimal agent — call the model, run the tool it asks for, feed the result back, repeat — is genuinely complete for a demo. I wrote one in ~150 readable lines: https://github.com/mnifzied-create/agentloop.
But the moment real users hit it, eight things break. None of them need a framework — each is a small, readable layer on top of the loop.
1. The model asks for three tools at once — and you run them one at a time. Wrap the tool calls in Promise.all. Parallel by default.
2. One flaky API call kills the whole turn. Wrap each tool in a retry with backoff, and return the error as a string to the model instead of throwing — it can recover on the next step.
3. It forgets everything between requests. Persist threads. Node's built-in node:sqlite is enough — no service, no native build.
4. One user (or a runaway loop) runs up your bill. A token-bucket rate limiter, per user / IP.
5. The agent deletes a record / sends an email / charges a card — with no confirmation. Wrap irreversible tools in a human-in-the-loop approval gate.
6. You tweak the prompt and three behaviors silently regress. A tiny eval harness with pass/fail cases you run in CI.
7. One agent juggling twelve tools gets confused. Expose a whole agent as a single tool — a sub-agent — and let a parent delegate.
8. You're regex-parsing the model's prose for data. Force a tool call whose input_schema is your output type. Typed JSON, no parsing.
That's the entire gap between "works in the demo" and "works in production" — and every item is a small composable piece you can read top to bottom, not magic hidden in a dependency.
The free core (the loop) and these eight patterns are all in the repo — read every line: https://github.com/mnifzied-create/agentloop
The point isn't the code. It's that you can own an agent instead of importing one.
What breaks for you in production that isn't on this list?