I Built an AI Gateway from Scratch (So You Don't Have To)

dev.to

TL;DR

I built a local AI gateway using Envoy, Rust, and Kubernetes to understand how AI traffic actually works.

It broke multiple times. I fixed it. I learned a lot.


Why I Did This

I wanted to understand how AI gateways actually work.

Not the diagrams.

Not the marketing slides.

The real system — the code, the flow, the failures.

So I built one.

Three weeks later, I had something working.

But getting there meant debugging cryptic errors, chasing version mismatches, and nearly giving up a few times.

Here's what I learned.


What I Built

A local AI Gateway that looks like this:

curl → agentgateway proxy → Rust module → httpbun (mock LLM) → response

Everything runs locally using kind (Kubernetes in Docker):

  • No cloud costs
  • No API keys
  • Fully reproducible

Components:

  • Envoy → handles traffic
  • kgateway + agentgateway → control plane
  • Rust module → request/response transformation
  • httpbun → fake OpenAI-compatible LLM

This isn't production-ready.

It's a learning lab — and it taught me more than any tutorial ever could.


Why Even Build This?

AI traffic isn't like regular API traffic.

When calling an LLM, you often need to:

  • Inject system prompts
  • Mask sensitive data
  • Route requests to different models
  • Track tokens and cost usage

Traditional API gateways don't handle this well.

That's where kgateway comes in — it lets you extend Envoy with custom logic using Rust.

That's what I wanted to explore.


The Stack

Tool Role
kind Local Kubernetes cluster
kgateway + agentgateway Gateway control plane
Envoy Data plane proxy
Rust Custom transformation logic
httpbun Mock LLM backend

Everything is open source. Everything runs locally.


Architecture

Request flow through the AI Gateway. Numbers show the sequence from client request to mock LLM response. (Source : draw.io

This diagram looks simple — but getting each step to work correctly took hours of debugging.


The Problems That Almost Broke Me

1. Rust Versions Move Fast

One day everything worked. The next day:

error: feature edition2024 is required

A dependency (getrandom) needed a newer Rust version than I had.

Fix: Upgraded Rust in my Dockerfile (1.75 → 1.85)

Lesson: Pin versions — or be ready to chase updates.


2. The "Undefined Symbol" Nightmare

Envoy crashed with:
undefined symbol: envoy_dynamic_module_callback_http_add_response_header

Everything looked correct.

Root cause: My SDK didn't match the Envoy version.

Fix: Used the official SDK directly from Envoy source.

Lesson: Version mismatches in Envoy dynamic modules will break everything. No shortcuts.


3. The filter_config Mystery

Envoy kept throwing:
error parsing filter config: EOF while parsing a value

Tried everything:

  • {}
  • "{}"
  • YAML tricks…

Nothing worked.

Fix:

filter_config:
  "@type": type.googleapis.com/google.protobuf.StringValue
  value: "{}"
Enter fullscreen mode Exit fullscreen mode

Lesson: Sometimes the docs do have the answer — you just haven't found it yet.

The Moment It Worked

Then I ran:

curl -X POST http://localhost:8082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'
Enter fullscreen mode Exit fullscreen mode

And got:

{"choices":[{"message":{"content":"This is a mock chat response from httpbun."}}]}
Enter fullscreen mode Exit fullscreen mode

That moment hits differently.

Everything connected:

  • Rust module

  • Gateway routing

  • Mock LLM response

Why I Used a Mock LLM

Real LLMs:

  • Cost money

  • Require API keys

  • Add latency

So I used httpbun, which mimics OpenAI APIs locally.

This made the project:

  • Fully local

  • Reproducible

  • Beginner-friendly

What I Learned

For Platform Engineers

  • Envoy dynamic modules are powerful — but strict

  • Version alignment is critical

  • Gateway API is worth learning deeply

For Documentation Engineers

  • Broken systems reveal real documentation gaps

  • Every error is a learning opportunity

  • Keeping a debug log is invaluable

For Everyone

  • Read the docs

  • Match versions exactly

  • Start with mocks before real integrations

The Code:

👉 link

Includes:

  • Kubernetes manifests

  • Rust source code

  • Docker setup

  • Quick start guide

You can run everything locally in ~10 minutes.

What's Next

To make this production-ready:

  • Replace httpbun with a real LLM (Ollama / OpenAI)
  • Add auth + rate limiting
  • Build more advanced Rust transformations

Final Thoughts

Building from scratch forces understanding.

You don't just "use" tools — you see how they break, how they connect, and why they exist.

That's where real learning happens.

If you're curious about AI infrastructure:

Build something. Break it. Fix it. Write about it.

Questions? Reach out on GitHub or LinkedIn.

Source: dev.to

arrow_back Back to News