Building a Distributed Rate Limiter That Handles 18,769 req/s with Redis Lua Scripts
How I implemented 4 atomic rate-limiting algorithms, consistent hashing across 3 Redis shards, and hit 18,769 req/s at p95 16ms — from scratch.
Why Rate Limiting is Hard
Rate limiting sounds simple: "allow X requests per Y seconds." But in a distributed system, it gets complicated fast.
The challenges:
- Multiple server instances — how do you share state?
- Race conditions — two requests arrive simultaneously, both check the counter, both see "limit not reached", both proceed. One too many.
- Hotspots — all traffic to one Redis node? That's your bottleneck.
- Algorithm choice — Token Bucket? Sliding Window? Fixed Window? Each has tradeoffs.
I built a distributed rate limiter that solves all of these.
The Four Algorithms
I implemented all four major rate limiting algorithms. Here's when to use each:
1. Fixed Window Counter
|----window----|----window----|
100 requests 100 requests
Problem: A burst at the boundary (50 at end of window 1 + 50 at start of window 2 = 100 in 1 second, even though each window allowed 100).
2. Sliding Window Log
Track exact timestamps of each request. Most accurate, but memory-intensive — stores every request timestamp.
3. Sliding Window Counter
Approximate sliding window using two fixed windows. The money formula:
current_count = prev_window_count × (1 - elapsed/window_size) + curr_window_count
Best balance of accuracy and memory efficiency.
4. Token Bucket
Tokens refill at a constant rate. Allows controlled bursting — great for APIs where occasional spikes are acceptable.
The Key Insight: Atomic Operations with Redis Lua
The classic race condition:
Thread 1: GET counter → 99
Thread 2: GET counter → 99
Thread 1: INCR counter → 100 (allowed)
Thread 2: INCR counter → 101 (should be denied, wasn't)
The fix: do everything in a single atomic Redis operation.
Redis Lua scripts execute atomically — no other command can run between your script's operations.
Here's my sliding window counter in Lua:
local key = KEYS[1]
local prev_key = KEYS[2]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Get counts from both windows
local curr_count = tonumber(redis.call('GET', key) or 0)
local prev_count = tonumber(redis.call('GET', prev_key) or 0)
-- Calculate elapsed time in current window
local elapsed = now % window
local weight = 1 - (elapsed / window)
-- Approximate sliding window count
local sliding_count = math.floor(prev_count * weight) + curr_count
if sliding_count >= limit then
-- Rate limited
return {0, sliding_count, limit}
end
-- Increment and set expiry
redis.call('INCR', key)
redis.call('EXPIRE', key, window * 2)
return {1, sliding_count + 1, limit}
Calling it from Go:
var slidingWindowScript = redis.NewScript(`
-- (lua script above)
`)
func (r *RateLimiter) IsAllowed(ctx context.Context, key string) (bool, error) {
now := time.Now().Unix()
windowStart := now - (now % r.windowSize)
currKey := fmt.Sprintf("%s:%d", key, windowStart)
prevKey := fmt.Sprintf("%s:%d", key, windowStart-r.windowSize)
result, err := slidingWindowScript.Run(ctx, r.client,
[]string{currKey, prevKey},
r.limit, r.windowSize, now,
).Int64Slice()
return result[0] == 1, err
}
Atomic. No race conditions. Ever.
Consistent Hashing Across 3 Redis Shards
One Redis node is a single point of failure and a throughput bottleneck. I shard across 3 Redis nodes using consistent hashing.
type ConsistentHashRing struct {
nodes []string
ring map[uint32]string
sortedKeys []uint32
virtualNodes int
mu sync.RWMutex
}
func (r *ConsistentHashRing) AddNode(node string) {
r.mu.Lock()
defer r.mu.Unlock()
// Add 150 virtual nodes per physical node
for i := 0; i < r.virtualNodes; i++ {
key := r.hash(fmt.Sprintf("%s-%d", node, i))
r.ring[key] = node
r.sortedKeys = append(r.sortedKeys, key)
}
sort.Slice(r.sortedKeys, func(i, j int) bool {
return r.sortedKeys[i] < r.sortedKeys[j]
})
r.nodes = append(r.nodes, node)
}
func (r *ConsistentHashRing) GetNode(key string) string {
r.mu.RLock()
defer r.mu.RUnlock()
hash := r.hash(key)
// Binary search for the first node >= hash
idx := sort.Search(len(r.sortedKeys), func(i int) bool {
return r.sortedKeys[i] >= hash
})
if idx == len(r.sortedKeys) {
idx = 0
}
return r.ring[r.sortedKeys[idx]]
}
150 virtual nodes per physical node is the sweet spot — minimizes shard remapping when nodes are added/removed (~20% of keys remapped instead of ~100%).
Connection Pool Tuning
This single change boosted throughput by 61% and cut latency by 43%.
Default Go Redis pool: 10 connections per shard.
// BEFORE — default
client := redis.NewClient(&redis.Options{
Addr: addr,
})
// PoolSize: 10 (default)
// Result: goroutines blocking waiting for connections
// AFTER — tuned
client := redis.NewClient(&redis.Options{
Addr: addr,
PoolSize: 50, // 50 connections per shard
MinIdleConns: 10, // Keep 10 warm
PoolTimeout: 2 * time.Second,
ReadTimeout: 500 * time.Millisecond,
WriteTimeout: 500 * time.Millisecond,
})
Why does this help? Under high concurrency, goroutines block waiting for an available connection from the pool. Increasing pool size reduces this contention dramatically.
But don't set it too high — each connection costs memory on the Redis server, and you can overwhelm it.
LRU Cache Layer
Before hitting Redis at all, I check an in-memory LRU cache:
type DecisionCache struct {
cache *lru.Cache
ttl time.Duration
hits prometheus.Counter
misses prometheus.Counter
}
func (c *DecisionCache) Get(key string) (Decision, bool) {
val, ok := c.cache.Get(key)
if !ok {
c.misses.Inc()
return Decision{}, false
}
entry := val.(*CacheEntry)
if time.Since(entry.CreatedAt) > c.ttl {
c.cache.Remove(key)
c.misses.Inc()
return Decision{}, false
}
c.hits.Inc()
return entry.Decision, true
}
Result: 93.5% cache hit rate, saving ~200μs per decision.
The math: 93.5% of 18,769 req/s = 17,549 requests served from memory, 1,220 hitting Redis.
Benchmark Results
Load tested with k6:
scenarios: 200 VUs, 60 seconds
✓ http_req_duration: avg=9.2ms p(95)=16ms
✓ http_req_failed: 0.00%
✓ iterations: 1,126,140
Throughput: 18,769 req/s
After connection pool tuning (10 → 50):
- Throughput: +61% (11,658 → 18,769 req/s)
- p95 latency: -43% (28ms → 16ms)
Key Lessons
1. Lua scripts are the only safe way to do atomic operations in Redis.
WATCH/MULTI/EXEC optimistic locking works but adds complexity. Lua is simpler and guaranteed atomic.
2. Consistent hashing is worth the complexity.
Simple modulo hashing (key % N) remaps nearly all keys when you add/remove a node. Consistent hashing remaps only K/N keys.
3. Connection pool size matters more than you think.
Profile your goroutine blocking before optimizing algorithm. The bottleneck is often connection contention, not computation.
4. Cache the result, not just the data.
I cache the rate limiting decision, not the counter. Much simpler invalidation logic.
Source Code
github.com/sameer-sde/ratelimit
I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.