Observability told me exactly how much money my agents wasted. I wanted something that says no.

typescript dev.to

Most AI cost tooling is an autopsy. It tells you, in detail, what you already spent — token counts, per-call traces, a
dashboard that turns red after the bill is locked in. None of it does the one thing I kept wanting: refuse the call before
it goes out.

I ran into this building agent tooling. Once I had more than a couple of agents hitting paid APIs on a schedule, two
problems showed up that nothing off the shelf solved cleanly.

Problem 1: observability is not control

Watching spend and stopping spend are different systems, and every tool I tried lived on the watching side. I could
reconstruct, after the fact, that agent 4 had a bad night. What I couldn't do was tell agent 4 "you're done for today"
without a hard limit that fires before the request leaves.

The closest thing providers offer is per-key budgeting. That sounds right until you run more than one agent. Keys get
shared, and the moment three agents share an API key a per-key cap can't tell them apart — you've lost the unit that
actually matters, which is the agent.

So the cap I wanted was specific:

  • per agent, not per key
  • enforced in the request path — over budget means the call is refused before it goes out, not logged after it returns
  • two dimensions: calls/day and a max per single call
  • a kill-switch on call-rate spikes, because the runaway-loop case is the one that hurts at 3am

Problem 2: I didn't want to hand over my keys

Plenty of "AI gateway" products will do governance for you — by becoming the thing that holds your API keys and signs
requests on your behalf. For a fleet that touches real money, handing custody of credentials to a third party is a hard no.
I wanted enforcement without custody: keep my own keys, let something in front of the fleet enforce the rules.

What I ended up building

Couldn't find a drop-in that did per-agent, request-path enforcement without taking custody, so I built one. It's a proxy
you point agents at. They keep their own keys. No rewrite, no framework lock-in — LangChain, CrewAI, or a raw script all
talk to the same proxy.

The integration is boring on purpose:

import { createPaymentClient } from "@gatewards/agent-sdk";

const client = createPaymentClient({
apiKey: process.env.GATEWARDS_AGENT_KEY, // identifies THIS agent
proxy: true,
});

// your agent's calls go through the proxy unchanged
const res = await client.get("https://api.example.com/data");

You set the cap per agent (calls/day + max per call). When an agent goes over, the proxy returns a refusal in the request
path — your call gets a 429, not a silent overage you discover tomorrow. When an agent's rate spikes into loop territory,
the pipeline auto-pauses instead of grinding through your budget.

Because every call is already tagged by agent identity, attribution stops being a grep session. You get "which agent spent
what" for free, as a side effect of the thing that enforces the caps.

The one that surprised me: cross-agent dedup

This one I didn't plan for. Several agents poll the same endpoints — same GET, same params, different agents. The proxy
caches identical GET responses across the whole fleet, so five agents making the same call pay for one. On a polling-heavy
fleet that turned out to be a bigger line-item win than the caps.

What it deliberately doesn't do

Honesty matters more than a clean pitch, so the limits up front:

  • It doesn't estimate dollar caps. Caps are calls/day and max-per-call, not "$5/day". Estimating real-time per-call cost across arbitrary upstream APIs is a guess, and I'd rather give you a primitive that's exact than a dollar figure that's wrong. If you genuinely need a $ cap, I want to hear it — that's an open design question for me.
  • Dedup is GET-only by default. POST caching is opt-in per pipeline, because deduping a non-idempotent call is how you ship a bug.
  • It's a proxy in your request path. That's a dependency. It's built to fail open on its own errors rather than take your fleet down, but you should know it's there.

Where it is

It's live at gatewards.com, and the SDK is open source (Apache-2.0): npm i @gatewards/agent-sdk

If you're running a fleet and fighting the same thing, I'd genuinely like to compare notes — especially on the cap-primitive
question. Is calls/day + max-per-call enough, or does the lack of a dollar cap break it for you? Tell me where this falls
short.

Source: dev.to

arrow_back Back to Tutorials