Rate Limiting Deep Dive: Token Bucket vs Leaky Bucket vs Sliding Window

Protect your backend infrastructure from resource exhaustion by controlling traffic admission with precision.

What We're Building

This article guides you through implementing robust rate limiting strategies in Go. We explore the core mechanics of Token Bucket, Leaky Bucket, and Sliding Window algorithms, explaining their trade-offs in handling bursts versus sustained load. We also address persistence for distributed systems and selection criteria to ensure your system remains stable under pressure.

Step 1 — Define the Rate Limit Contract

Before selecting an algorithm, you must establish a data structure to track time and tokens. This contract ensures type safety and serialization compatibility. In Go, you typically use an atomic integer for simple counters or a sync.Map for complex state. For a Token Bucket, you track tokens and lastRefillTime. For a Sliding Window, you track a map of timeWindow to count.

type RateLimiter struct {
    tokens       int64
    capacity     int64
    lastRefill   time.Time
    refillRate   int64 // tokens per second
}

The RateLimiter struct encapsulates the state required for a single-node implementation. By centralizing the state here, you ensure that every request goes through a consistent logic path. This also simplifies debugging. If the rate limit is too strict, you can increase capacity or refillRate without changing the structural layout. This approach supports testing by isolating the logic.

Step 2 — Implement Token Bucket for Burst Tolerance

This algorithm simulates a bucket of tokens that fills up at a fixed rate. A request consumes one token. If empty, the request is rejected. This allows bursts up to the bucket capacity. When no requests arrive for 1 second, the bucket refills to capacity.

func (rl *RateLimiter) Allow() bool {
    now := time.Now()
    // Refill logic
    elapsed := now.Sub(rl.lastRefill)
    tokens := elapsed.Microseconds() / (1000000 / rl.refillRate)
    newTokens := rl.tokens + tokens
    if newTokens > rl.capacity {
        newTokens = rl.capacity
    }
    rl.tokens = newTokens
    rl.lastRefill = now

    if rl.tokens > 0 {
        rl.tokens--
        return true
    }
    return false
}

The refill calculation uses elapsed to determine how many tokens to add. This handles sudden traffic spikes better than fixed windows, which reset abruptly at the end of a period. For example, if you allow 100 requests per second, a sudden burst of 50 requests is allowed, then the system waits for the remaining capacity to refill before allowing the next batch. This reduces latency for legitimate high-demand periods.

      Tokens
      ^
      |
      |      [ Bucket Fills ]
      |     /
      |    /
      |---/----------------> Time
      |  /    [ Burst Allowed ]
      | /
      |/

Step 3 — Implement Leaky Bucket for Traffic Smoothing

Unlike the Token Bucket, the Leaky Bucket processes requests at a constant rate regardless of arrival rate. Requests enter a queue. If the bucket is full, new requests are dropped. The "leak" rate represents the backend processing speed. This prevents downstream overload from upstream bursts.

func (lb *LeakyBucket) Enqueue(req Request) bool {
    if lb.queueDepth > lb.maxDepth {
        return false // Drop request
    }
    lb.queueDepth++
    // Simulate leak
    go func() {
        time.Sleep(time.Second / lb.leakRate)
        if lb.queueDepth > 0 {
            lb.queueDepth--
        }
    }()
    return true
}

The go routine simulates the leak. In production, this leak logic should run on the same thread to avoid concurrency issues or use a dedicated worker pool. The core logic is the comparison of queueDepth against maxDepth. When depth exceeds the limit, the request is rejected with a 429 status. This algorithm is ideal for protecting a specific downstream service from high variance in incoming traffic.

Step 4 — Implement Sliding Window for Precision

The Token Bucket and Leaky Bucket are approximations. The Sliding Window counts requests in a fixed time interval (e.g., 1 second) and checks the count. It tracks a sliding window of logs within a window. Code checks the count since start time. The window shifts forward with every tick.

func (sw *SlidingWindow) Check(req Request) bool {
    window := req.timeWindow
    count := window.counts[req.time]
    if count < sw.limit {
        count++
        window.counts[req.time] = count
        return true
    }
    return false
}

This approach is accurate because it counts exactly how many requests have arrived in the current window. However, it is more complex to implement efficiently without a database or cache. A naive implementation uses a slice or map of timestamps. For high throughput, you might combine a Token Bucket with a counter reset. The benefit is that the reset is gradual rather than abrupt. This prevents the "sawtooth" pattern where all limits are reached just before the reset.

Step 5 — Persist State for Distributed Systems

Single-node implementations fail under multi-node load. You must use Redis for multi-node state. The key is to store the current state in Redis. When a request arrives, the application atomically checks and updates the Redis key. This avoids a single point of failure.

# Example Redis Lua Script
local tokens = tonumber(redis.call('GET', KEYS[1]) or "100")
local now = tonumber(redis.call('SET', KEYS[1], tokens - 1, 'PX', 1000))
return now

The Lua script ensures atomicity. If you used separate GET and SET commands, a concurrent request might read a stale value and succeed, bypassing the limit. The PX parameter sets an expiration on the key. This ensures memory leaks are managed. This setup is required for microservices where no central authority controls the rate limit.

Step 6 — Select the Right Algorithm

Your choice depends on traffic patterns and downstream constraints.

Pattern	Algorithm	Reason
Burst Tolerance	Token Bucket	Allows spikes up to capacity.
Constant Rate	Leaky Bucket	Smooths output for fragile APIs.
Precision	Sliding Window	Accurate counting without abrupt resets.
Global Limits	Redis + Lua	Shared state across distributed instances.

If you need to protect a specific user or IP, use the Sliding Window. If you want to smooth out traffic spikes for a shared resource, use the Leaky Bucket. For general API protection, Token Bucket is the default choice. The decision matrix ensures you match the algorithm to the specific problem.

Takeaways

State is Key: You must maintain state to track limits.
Atomicity: Use Lua scripts for Redis to prevent race conditions.
Algorithm Choice: Pick based on whether you want burst tolerance (Token) or smoothing (Leaky).
Complexity vs. Accuracy: Sliding Window is most accurate but computationally heavier.
Persistence: Use Redis for distributed setups to share state across instances.
Retry Logic: Clients should retry with exponential backoff if rate limited.

What's Next

Extend this by implementing adaptive rate limiting where the limit adjusts based on response times. You can monitor the tail latency of your backend and increase the limit if the error rate drops. This ensures you utilize available capacity while protecting against overload. You can also add custom headers to inform clients of their current quota.