Rate Limiting at Scale: Building Fixed Window and Token Bucket in Go

Rate limiting is one of those things every backend engineer knows they need but few actually build from scratch. Most reach for a library. I built mine — two algorithms, Redis-backed, with Lua scripting for atomicity. Here's what the tradeoffs actually look like when you're writing the implementation instead of just configuring it.

Why build it instead of using a library?

Mostly to understand what the library is doing. Rate limiting looks simple until you think about concurrent requests hitting the same counter at the same millisecond. That's where the interesting problems live — and a library abstracts all of that away from you.

I implemented two algorithms: fixed window and token bucket. They solve the same problem differently, and the difference matters depending on your traffic pattern.

Fixed window

The simplest mental model: you get N requests per time window. Window resets, counter resets.


func (fw *FixedWindow) Allow(key string) (bool, error) {
    count, err := fw.store.Increment(key, fw.windowSize)
    if err != nil {
        return false, err
    }
    return count <= fw.limit, nil
}

The problem with naive fixed window is the boundary attack. If your window resets every minute, a client can send 100 requests at 11:59 and another 100 at 12:00 — 200 requests in two seconds, double your intended limit. This is a well-known flaw, and it's why sliding window exists. Fixed window is fast and simple, but you need to know what you're trading off.

Token bucket

Token bucket is more nuanced. You have a bucket with a maximum capacity. Tokens refill at a constant rate. Each request consumes a token. If the bucket is empty, the request is rejected.


func (tb *TokenBucket) Allow(key string) (bool, error) {
    now := time.Now().Unix()
    tokens, err := tb.store.GetTokens(key, tb.refillRate, tb.capacity, now)
    if err != nil {
        return false, err
    }
    return tokens > 0, nil
}

This handles bursts gracefully. A client that's been idle accumulates tokens up to the bucket capacity, then can burst at full speed until empty. It's a better model for real API traffic — users aren't perfectly uniformly distributed.

The complexity cost: you need to track both token count and last refill timestamp, and compute the refill delta on every request. That's two reads and a write per check, which is where atomicity becomes critical.

The Redis + Lua atomicity problem

Here's the race condition that bites you if you're not careful. With token bucket:

Read current token count
Compute new count based on elapsed time
Write new count back

If two requests hit simultaneously, both read the same token count, both compute independently, and both write — one of the writes gets lost. You've now allowed more requests than you should have.

The fix is making the read-compute-write a single atomic operation. Redis supports this via Lua scripts, which execute atomically on the Redis server:


local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = now - last_refill
local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)

if new_tokens < 1 then
    return 0
end

redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
return 1

No locks. No transactions. The entire check-and-decrement happens in one Redis round trip, atomically. This is the right way to do distributed rate limiting.

Fixed window Lua is simpler


local count = redis.call('INCR', KEYS[1])
if count == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count

INCR is already atomic in Redis, but wrapping it with the EXPIRE logic in Lua ensures the TTL gets set exactly once on the first request — no race between increment and expire.

CI with GitHub Actions

Every push runs the test suite automatically. The workflow is straightforward:


name: CI
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    services:
      redis:
        image: redis
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v4
        with:
          go-version: '1.21'
      - run: go test ./...

The key part is spinning up a real Redis instance in the CI environment as a service container. Testing against a real Redis rather than a mock means your Lua scripts actually get executed and validated — mocks won't catch scripting errors.

Fixed window vs token bucket — when to use which
Fixed window Token bucket
Implementation Simple More complex
Burst handling Poor Good
Boundary vulnerability Yes No
Redis ops per request 1 1 (via Lua)
Best for Internal services, simple APIs Public APIs, user-facing rate limits

For most public APIs, token bucket is the right default. Fixed window is fine for internal service-to-service limits where you control both sides and traffic is predictable.

What I'd add next

Sliding window log — the theoretically correct algorithm that tracks individual request timestamps. More memory-intensive than either of these, but eliminates the boundary problem of fixed window without the refill complexity of token bucket. Also a sliding window counter, which approximates it cheaply using two fixed windows.

The full source

Both algorithms, Redis store, Lua scripts, and CI config are on GitHub. The code is designed to be readable — if you're implementing your own, the Lua scripts are the part worth studying.