We've all seen the screenshot: someone leaves an agent running overnight, a retry loop goes sideways, and they wake up to a bill that's 10x what they expected — "$40 from a $5 task." The scary part isn't the money, it's that nothing stopped it. You find out when the invoice lands.
I wanted the dumbest possible fix: a hard cap that just blocks the next call when you're over budget. Not a dashboard I have to remember to check, not a gateway I have to deploy. So I built budget-guard — a tiny drop-in wrapper for the OpenAI/Anthropic clients.
The whole idea in 3 lines
import { guard } from 'budget-guard';
const ai = guard(openai.chat.completions, { project: 'my-app', dailyCapUSD: 50 });
await ai.create({ model: 'gpt-4o', messages }, { feature: 'summarize' });
Once today's spend for my-app crosses $50, the next create() throws BudgetExceededError before it bills you. A runaway loop dies at the breaker instead of draining your account. It stays out of your critical path — your calls still go straight to the provider; budget-guard just meters the usage and trips when you're over.
Bonus: where did the money go?
It tags spend per feature, so you can finally answer "what cost that?":
import { spendReport } from 'budget-guard';
spendReport('my-app'); // → { summarize: 2.41, chat: 0.88 } (today, USD)
It auto-detects OpenAI (prompt_tokens/completion_tokens) and Anthropic (input_tokens/output_tokens) usage shapes; for anything else you pass a one-line extractor.
Being honest about v0.1
I'd rather tell you the limits than have you find them on an invoice:
- In-memory, per-process. Great for a single script, agent, or worker. Run multiple instances and the cap is per-instance and resets on restart. A shared (Redis) store is the obvious next step.
- Enforced on the next call. No pre-call token estimation yet, so one call can overshoot before the breaker trips.
- Prices are a hand-maintained table. PRs welcome to keep them current.
So today it's best for solo devs, side projects, and single-process agents — exactly the people most likely to get surprised by a bill.
Try it / break it
npm i budget-guard
MIT, no runtime deps, TypeScript. Repo: https://github.com/kimbeomgyu/budget-guard
I'd love feedback on two things: should it block by default, or warn + callback? And if you run a fleet, would a Redis-backed shared cap actually get used, or do you already handle this at a gateway layer? Issues and PRs very welcome.