Your 150-line AI agent works in the demo. Here's what breaks in production.

A minimal agent — call the model, run the tool it asks for, feed the result back, repeat — is genuinely complete for a demo. I wrote one in ~150 readable lines: https://github.com/mnifzied-create/agentloop.

But the moment real users hit it, eight things break. None of them need a framework — each is a small, readable layer on top of the loop.

1. The model asks for three tools at once — and you run them one at a time. Wrap the tool calls in Promise.all. Parallel by default.

2. One flaky API call kills the whole turn. Wrap each tool in a retry with backoff, and return the error as a string to the model instead of throwing — it can recover on the next step.

3. It forgets everything between requests. Persist threads. Node's built-in node:sqlite is enough — no service, no native build.

4. One user (or a runaway loop) runs up your bill. A token-bucket rate limiter, per user / IP.

5. The agent deletes a record / sends an email / charges a card — with no confirmation. Wrap irreversible tools in a human-in-the-loop approval gate.

6. You tweak the prompt and three behaviors silently regress. A tiny eval harness with pass/fail cases you run in CI.

7. One agent juggling twelve tools gets confused. Expose a whole agent as a single tool — a sub-agent — and let a parent delegate.

8. You're regex-parsing the model's prose for data. Force a tool call whose input_schema is your output type. Typed JSON, no parsing.

That's the entire gap between "works in the demo" and "works in production" — and every item is a small composable piece you can read top to bottom, not magic hidden in a dependency.

The free core (the loop) and these eight patterns are all in the repo — read every line: https://github.com/mnifzied-create/agentloop

The point isn't the code. It's that you can own an agent instead of importing one.

What breaks for you in production that isn't on this list?