I built a free circuit breaker for your LLM API bill

typescript dev.to

We've all seen the screenshot: someone leaves an agent running overnight, a retry loop goes sideways, and they wake up to a bill that's 10x what they expected — "$40 from a $5 task." The scary part isn't the money, it's that nothing stopped it. You find out when the invoice lands.

I wanted the dumbest possible fix: a hard cap that just blocks the next call when you're over budget. Not a dashboard I have to remember to check, not a gateway I have to deploy. So I built budget-guard — a tiny drop-in wrapper for the OpenAI/Anthropic clients.

The whole idea in 3 lines

import { guard } from 'budget-guard';

const ai = guard(openai.chat.completions, { project: 'my-app', dailyCapUSD: 50 });

await ai.create({ model: 'gpt-4o', messages }, { feature: 'summarize' });
Enter fullscreen mode Exit fullscreen mode

Once today's spend for my-app crosses $50, the next create() throws BudgetExceededError before it bills you. A runaway loop dies at the breaker instead of draining your account. It stays out of your critical path — your calls still go straight to the provider; budget-guard just meters the usage and trips when you're over.

Bonus: where did the money go?

It tags spend per feature, so you can finally answer "what cost that?":

import { spendReport } from 'budget-guard';
spendReport('my-app'); // → { summarize: 2.41, chat: 0.88 }  (today, USD)
Enter fullscreen mode Exit fullscreen mode

It auto-detects OpenAI (prompt_tokens/completion_tokens) and Anthropic (input_tokens/output_tokens) usage shapes; for anything else you pass a one-line extractor.

Being honest about v0.1

I'd rather tell you the limits than have you find them on an invoice:

  • In-memory, per-process. Great for a single script, agent, or worker. Run multiple instances and the cap is per-instance and resets on restart. A shared (Redis) store is the obvious next step.
  • Enforced on the next call. No pre-call token estimation yet, so one call can overshoot before the breaker trips.
  • Prices are a hand-maintained table. PRs welcome to keep them current.

So today it's best for solo devs, side projects, and single-process agents — exactly the people most likely to get surprised by a bill.

Try it / break it

npm i budget-guard
Enter fullscreen mode Exit fullscreen mode

MIT, no runtime deps, TypeScript. Repo: https://github.com/kimbeomgyu/budget-guard

I'd love feedback on two things: should it block by default, or warn + callback? And if you run a fleet, would a Redis-backed shared cap actually get used, or do you already handle this at a gateway layer? Issues and PRs very welcome.

Source: dev.to

arrow_back Back to Tutorials