Building a Distributed Rate Limiter That Handles 18,769 req/s with Redis Lua Scripts

go dev.to

Building a Distributed Rate Limiter That Handles 18,769 req/s with Redis Lua Scripts

How I implemented 4 atomic rate-limiting algorithms, consistent hashing across 3 Redis shards, and hit 18,769 req/s at p95 16ms — from scratch.


Why Rate Limiting is Hard

Rate limiting sounds simple: "allow X requests per Y seconds." But in a distributed system, it gets complicated fast.

The challenges:

  • Multiple server instances — how do you share state?
  • Race conditions — two requests arrive simultaneously, both check the counter, both see "limit not reached", both proceed. One too many.
  • Hotspots — all traffic to one Redis node? That's your bottleneck.
  • Algorithm choice — Token Bucket? Sliding Window? Fixed Window? Each has tradeoffs.

I built a distributed rate limiter that solves all of these.


The Four Algorithms

I implemented all four major rate limiting algorithms. Here's when to use each:

1. Fixed Window Counter

|----window----|----window----|
  100 requests    100 requests
Enter fullscreen mode Exit fullscreen mode

Problem: A burst at the boundary (50 at end of window 1 + 50 at start of window 2 = 100 in 1 second, even though each window allowed 100).

2. Sliding Window Log

Track exact timestamps of each request. Most accurate, but memory-intensive — stores every request timestamp.

3. Sliding Window Counter

Approximate sliding window using two fixed windows. The money formula:

current_count = prev_window_count × (1 - elapsed/window_size) + curr_window_count
Enter fullscreen mode Exit fullscreen mode

Best balance of accuracy and memory efficiency.

4. Token Bucket

Tokens refill at a constant rate. Allows controlled bursting — great for APIs where occasional spikes are acceptable.


The Key Insight: Atomic Operations with Redis Lua

The classic race condition:

Thread 1: GET counter → 99
Thread 2: GET counter → 99
Thread 1: INCR counter → 100 (allowed)
Thread 2: INCR counter → 101 (should be denied, wasn't)
Enter fullscreen mode Exit fullscreen mode

The fix: do everything in a single atomic Redis operation.

Redis Lua scripts execute atomically — no other command can run between your script's operations.

Here's my sliding window counter in Lua:

local key = KEYS[1]
local prev_key = KEYS[2]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Get counts from both windows
local curr_count = tonumber(redis.call('GET', key) or 0)
local prev_count = tonumber(redis.call('GET', prev_key) or 0)

-- Calculate elapsed time in current window
local elapsed = now % window
local weight = 1 - (elapsed / window)

-- Approximate sliding window count
local sliding_count = math.floor(prev_count * weight) + curr_count

if sliding_count >= limit then
    -- Rate limited
    return {0, sliding_count, limit}
end

-- Increment and set expiry
redis.call('INCR', key)
redis.call('EXPIRE', key, window * 2)

return {1, sliding_count + 1, limit}
Enter fullscreen mode Exit fullscreen mode

Calling it from Go:

var slidingWindowScript = redis.NewScript(`
    -- (lua script above)
`)

func (r *RateLimiter) IsAllowed(ctx context.Context, key string) (bool, error) {
    now := time.Now().Unix()
    windowStart := now - (now % r.windowSize)

    currKey := fmt.Sprintf("%s:%d", key, windowStart)
    prevKey := fmt.Sprintf("%s:%d", key, windowStart-r.windowSize)

    result, err := slidingWindowScript.Run(ctx, r.client,
        []string{currKey, prevKey},
        r.limit, r.windowSize, now,
    ).Int64Slice()

    return result[0] == 1, err
}
Enter fullscreen mode Exit fullscreen mode

Atomic. No race conditions. Ever.


Consistent Hashing Across 3 Redis Shards

One Redis node is a single point of failure and a throughput bottleneck. I shard across 3 Redis nodes using consistent hashing.

type ConsistentHashRing struct {
    nodes       []string
    ring        map[uint32]string
    sortedKeys  []uint32
    virtualNodes int
    mu          sync.RWMutex
}

func (r *ConsistentHashRing) AddNode(node string) {
    r.mu.Lock()
    defer r.mu.Unlock()

    // Add 150 virtual nodes per physical node
    for i := 0; i < r.virtualNodes; i++ {
        key := r.hash(fmt.Sprintf("%s-%d", node, i))
        r.ring[key] = node
        r.sortedKeys = append(r.sortedKeys, key)
    }
    sort.Slice(r.sortedKeys, func(i, j int) bool {
        return r.sortedKeys[i] < r.sortedKeys[j]
    })
    r.nodes = append(r.nodes, node)
}

func (r *ConsistentHashRing) GetNode(key string) string {
    r.mu.RLock()
    defer r.mu.RUnlock()

    hash := r.hash(key)

    // Binary search for the first node >= hash
    idx := sort.Search(len(r.sortedKeys), func(i int) bool {
        return r.sortedKeys[i] >= hash
    })

    if idx == len(r.sortedKeys) {
        idx = 0
    }

    return r.ring[r.sortedKeys[idx]]
}
Enter fullscreen mode Exit fullscreen mode

150 virtual nodes per physical node is the sweet spot — minimizes shard remapping when nodes are added/removed (~20% of keys remapped instead of ~100%).


Connection Pool Tuning

This single change boosted throughput by 61% and cut latency by 43%.

Default Go Redis pool: 10 connections per shard.

// BEFORE — default
client := redis.NewClient(&redis.Options{
    Addr: addr,
})
// PoolSize: 10 (default)
// Result: goroutines blocking waiting for connections
Enter fullscreen mode Exit fullscreen mode
// AFTER — tuned
client := redis.NewClient(&redis.Options{
    Addr:         addr,
    PoolSize:     50,              // 50 connections per shard
    MinIdleConns: 10,              // Keep 10 warm
    PoolTimeout:  2 * time.Second,
    ReadTimeout:  500 * time.Millisecond,
    WriteTimeout: 500 * time.Millisecond,
})
Enter fullscreen mode Exit fullscreen mode

Why does this help? Under high concurrency, goroutines block waiting for an available connection from the pool. Increasing pool size reduces this contention dramatically.

But don't set it too high — each connection costs memory on the Redis server, and you can overwhelm it.


LRU Cache Layer

Before hitting Redis at all, I check an in-memory LRU cache:

type DecisionCache struct {
    cache    *lru.Cache
    ttl      time.Duration
    hits     prometheus.Counter
    misses   prometheus.Counter
}

func (c *DecisionCache) Get(key string) (Decision, bool) {
    val, ok := c.cache.Get(key)
    if !ok {
        c.misses.Inc()
        return Decision{}, false
    }

    entry := val.(*CacheEntry)
    if time.Since(entry.CreatedAt) > c.ttl {
        c.cache.Remove(key)
        c.misses.Inc()
        return Decision{}, false
    }

    c.hits.Inc()
    return entry.Decision, true
}
Enter fullscreen mode Exit fullscreen mode

Result: 93.5% cache hit rate, saving ~200μs per decision.

The math: 93.5% of 18,769 req/s = 17,549 requests served from memory, 1,220 hitting Redis.


Benchmark Results

Load tested with k6:

scenarios: 200 VUs, 60 seconds

✓ http_req_duration: avg=9.2ms p(95)=16ms
✓ http_req_failed:  0.00%
✓ iterations:       1,126,140

Throughput: 18,769 req/s
Enter fullscreen mode Exit fullscreen mode

After connection pool tuning (10 → 50):

  • Throughput: +61% (11,658 → 18,769 req/s)
  • p95 latency: -43% (28ms → 16ms)

Key Lessons

1. Lua scripts are the only safe way to do atomic operations in Redis.
WATCH/MULTI/EXEC optimistic locking works but adds complexity. Lua is simpler and guaranteed atomic.

2. Consistent hashing is worth the complexity.
Simple modulo hashing (key % N) remaps nearly all keys when you add/remove a node. Consistent hashing remaps only K/N keys.

3. Connection pool size matters more than you think.
Profile your goroutine blocking before optimizing algorithm. The bottleneck is often connection contention, not computation.

4. Cache the result, not just the data.
I cache the rate limiting decision, not the counter. Much simpler invalidation logic.


Source Code

github.com/sameer-sde/ratelimit


I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.

Connect: LinkedIn · GitHub · Portfolio

Source: dev.to

arrow_back Back to Tutorials