I Built an AI Gateway from Scratch (So You Don't Have To)

TL;DR

I built a local AI gateway using Envoy, Rust, and Kubernetes to understand how AI traffic actually works.

It broke multiple times. I fixed it. I learned a lot.

Why I Did This

I wanted to understand how AI gateways actually work.

Not the diagrams.

Not the marketing slides.

The real system — the code, the flow, the failures.

So I built one.

Three weeks later, I had something working.

But getting there meant debugging cryptic errors, chasing version mismatches, and nearly giving up a few times.

Here's what I learned.

What I Built

A local AI Gateway that looks like this:

curl → agentgateway proxy → Rust module → httpbun (mock LLM) → response

Everything runs locally using kind (Kubernetes in Docker):

No cloud costs
No API keys
Fully reproducible

Components:

Envoy → handles traffic
kgateway + agentgateway → control plane
Rust module → request/response transformation
httpbun → fake OpenAI-compatible LLM

This isn't production-ready.

It's a learning lab — and it taught me more than any tutorial ever could.

Why Even Build This?

AI traffic isn't like regular API traffic.

When calling an LLM, you often need to:

Inject system prompts
Mask sensitive data
Route requests to different models
Track tokens and cost usage

Traditional API gateways don't handle this well.

That's where kgateway comes in — it lets you extend Envoy with custom logic using Rust.

That's what I wanted to explore.

The Stack

Tool	Role
kind	Local Kubernetes cluster
kgateway + agentgateway	Gateway control plane
Envoy	Data plane proxy
Rust	Custom transformation logic
httpbun	Mock LLM backend

Everything is open source. Everything runs locally.

Architecture

Request flow through the AI Gateway. Numbers show the sequence from client request to mock LLM response. (Source : draw.io

This diagram looks simple — but getting each step to work correctly took hours of debugging.

The Problems That Almost Broke Me

1. Rust Versions Move Fast

One day everything worked. The next day:

error: feature edition2024 is required

A dependency (getrandom) needed a newer Rust version than I had.

Fix: Upgraded Rust in my Dockerfile (1.75 → 1.85)

Lesson: Pin versions — or be ready to chase updates.

2. The "Undefined Symbol" Nightmare

Envoy crashed with:
undefined symbol: envoy_dynamic_module_callback_http_add_response_header

Everything looked correct.

Root cause: My SDK didn't match the Envoy version.

Fix: Used the official SDK directly from Envoy source.

Lesson: Version mismatches in Envoy dynamic modules will break everything. No shortcuts.

3. The filter_config Mystery

Envoy kept throwing:
error parsing filter config: EOF while parsing a value

Tried everything:

{}
"{}"
YAML tricks…

Nothing worked.

Fix:

filter_config:
  "@type": type.googleapis.com/google.protobuf.StringValue
  value: "{}"

Lesson: Sometimes the docs do have the answer — you just haven't found it yet.

The Moment It Worked

Then I ran:

curl -X POST http://localhost:8082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'

And got:

{"choices":[{"message":{"content":"This is a mock chat response from httpbun."}}]}

That moment hits differently.

Everything connected:

Rust module
Gateway routing
Mock LLM response

Why I Used a Mock LLM

Real LLMs:

Cost money
Require API keys
Add latency

So I used httpbun, which mimics OpenAI APIs locally.

This made the project:

Fully local
Reproducible
Beginner-friendly

What I Learned

For Platform Engineers

Envoy dynamic modules are powerful — but strict
Version alignment is critical
Gateway API is worth learning deeply

For Documentation Engineers

Broken systems reveal real documentation gaps
Every error is a learning opportunity
Keeping a debug log is invaluable

For Everyone

Read the docs
Match versions exactly
Start with mocks before real integrations

The Code:

👉 link

Includes:

Kubernetes manifests
Rust source code
Docker setup
Quick start guide

You can run everything locally in ~10 minutes.

What's Next

To make this production-ready:

Replace httpbun with a real LLM (Ollama / OpenAI)
Add auth + rate limiting
Build more advanced Rust transformations

Final Thoughts

Building from scratch forces understanding.

You don't just "use" tools — you see how they break, how they connect, and why they exist.

That's where real learning happens.

If you're curious about AI infrastructure:

Build something. Break it. Fix it. Write about it.

Questions? Reach out on GitHub or LinkedIn.