How I Built a Sliding Window Rate Limiter for Our Video API in Redis

Last quarter our video API started falling over every evening around 8pm UTC. Not because of legitimate traffic, but because a handful of scrapers discovered our /api/v1/videos/trending endpoint and decided to crawl it as fast as their connection pools allowed. Our LiteSpeed front end happily served thousands of requests per second, SQLite FTS5 queries piled up, and real users on DailyWatch saw spinners instead of video grids. We needed rate limiting, and the naive approaches we reached for first made things worse before they made them better. This article walks through how we landed on a sliding window limiter backed by Redis, why the obvious alternatives leak traffic, and the exact code running in production today.

Why Fixed Windows Quietly Betray You

The first thing almost everyone builds is a fixed-window counter. You bucket requests by a time interval, increment a counter, and reject anything over the limit. In PHP it looks innocent enough:

<?php
// The naive version - DO NOT ship this
function fixedWindowAllow(Redis $redis, string $clientId, int $limit, int $windowSeconds): bool
{
    $bucket = (int) floor(time() / $windowSeconds);
    $key = "rl:fixed:{$clientId}:{$bucket}";

    $count = $redis->incr($key);
    if ($count === 1) {
        // First hit in this bucket - set the TTL so it self-cleans
        $redis->expire($key, $windowSeconds);
    }

    return $count <= $limit;
}

This works in the demo and dies in production. The problem is the boundary. Say your limit is 100 requests per minute. A client can send 100 requests at 11:59:59 and another 100 at 12:00:00, because those land in two different buckets. That is 200 requests in one second while technically never violating "100 per minute." For a scraper, the bucket boundary is a free reset button it can hammer twice as hard around.

We saw exactly this. Our graphs showed clean per-minute averages that looked compliant, but the per-second view showed brutal spikes every time a window rolled over. Fixed windows smooth the math, not the actual load on your database.

Boundary bursts: up to 2x the intended rate across a window edge.
Thundering herds: every client's window resets at the same wall-clock moment.
Misleading metrics: averages look fine while p99 latency explodes.

What "Sliding Window" Actually Means

A sliding window counts requests in the last N seconds relative to right now, not relative to a fixed bucket. At 12:00:30 the window covers 11:59:30 through 12:00:30. One second later it covers 11:59:31 through 12:00:31. There is no edge to exploit because the window moves continuously with the clock.

There are two common ways to implement this, and they trade memory for precision:

Sliding window log: store a timestamp for every request, then count how many fall inside the window. Exact, but memory grows with request volume.
Sliding window counter: keep two fixed-bucket counters (current and previous) and weight the previous one by how much of it still overlaps the window. Approximate, but constant memory.

We run the log variant on our authenticated API where precision matters and request volume per key is modest, and the counter variant on anonymous edge traffic where we have millions of keys and cannot afford per-request storage. I will show both.

The Sliding Window Log With Redis Sorted Sets

Redis sorted sets (ZSET) are the perfect data structure for the log approach. We use the request timestamp as both the member and the score. Counting requests in the window becomes a ZCOUNT, and evicting old entries becomes a single ZREMRANGEBYSCORE.

Here is the logic in plain PHP before we make it atomic:

<?php
function slidingLogAllow(Redis $redis, string $clientId, int $limit, int $windowMs): bool
{
    $key = "rl:log:{$clientId}";
    $nowMs = (int) (microtime(true) * 1000);
    $windowStart = $nowMs - $windowMs;

    // 1. Drop everything older than the window
    $redis->zRemRangeByScore($key, 0, $windowStart);

    // 2. How many requests remain inside the window?
    $current = $redis->zCard($key);
    if ($current >= $limit) {
        return false;
    }

    // 3. Record this request. Unique member avoids collisions on same-ms hits.
    $member = $nowMs . ':' . bin2hex(random_bytes(4));
    $redis->zAdd($key, $nowMs, $member);

    // 4. Keep the key from living forever if the client goes quiet
    $redis->expire($key, (int) ceil($windowMs / 1000) + 1);

    return true;
}

This is correct in spirit but has a fatal flaw under concurrency: steps 2 and 3 are separate round trips. Two requests can both read $current as 99, both decide they are allowed, and both write, pushing you to 101. Under the scraper load that started this whole project, that race fires constantly. We need the read-decide-write to be a single atomic operation.

Making It Atomic With a Lua Script

Redis executes Lua scripts atomically. No other command runs in the middle. By moving the entire check into a script we eliminate the race and cut four round trips down to one, which also matters for latency since our Redis sits one network hop away from the PHP workers.

-- sliding_window.lua
-- KEYS[1] = the rate limit key
-- ARGV[1] = now in milliseconds
-- ARGV[2] = window size in milliseconds
-- ARGV[3] = max requests allowed in the window
-- ARGV[4] = a unique member id for this request
local key       = KEYS[1]
local now       = tonumber(ARGV[1])
local window    = tonumber(ARGV[2])
local limit     = tonumber(ARGV[3])
local member    = ARGV[4]
local clearBefore = now - window

-- Evict expired entries first
redis.call('ZREMRANGEBYSCORE', key, 0, clearBefore)

local current = redis.call('ZCARD', key)
if current < limit then
    redis.call('ZADD', key, now, member)
    -- Refresh TTL so idle keys disappear
    redis.call('PEXPIRE', key, window + 1000)
    -- Allowed: return remaining quota
    return limit - current - 1
end

-- Rejected: tell the caller how long until the oldest entry expires
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retryAfter = window
if oldest[2] then
    retryAfter = (tonumber(oldest[2]) + window) - now
end
redis.call('PEXPIRE', key, window + 1000)
return -retryAfter

The return value does double duty: a non-negative number is the remaining quota when allowed, and a negative number is the milliseconds until retry when rejected. That lets the caller populate a correct Retry-After header instead of guessing. Loading and calling it from PHP looks like this:

<?php
final class SlidingWindowLimiter
{
    private string $sha;

    public function __construct(
        private readonly Redis $redis,
        private readonly int $limit = 100,
        private readonly int $windowMs = 60_000,
    ) {
        // Load once; SCRIPT LOAD returns the SHA we call by later
        $this->sha = $this->redis->script('load', file_get_contents(__DIR__ . '/sliding_window.lua'));
    }

    /** @return array{allowed: bool, remaining: int, retryAfterMs: int} */
    public function check(string $clientId): array
    {
        $nowMs  = (int) (microtime(true) * 1000);
        $member = $nowMs . ':' . bin2hex(random_bytes(5));
        $key    = "rl:log:{$clientId}";

        // EVALSHA avoids resending the script body every call
        $result = $this->redis->evalSha(
            $this->sha,
            [$key, $nowMs, $this->windowMs, $this->limit, $member],
            1 // number of KEYS
        );

        if ($result >= 0) {
            return ['allowed' => true, 'remaining' => (int) $result, 'retryAfterMs' => 0];
        }

        return ['allowed' => false, 'remaining' => 0, 'retryAfterMs' => (int) abs($result)];
    }
}

One production gotcha: after a Redis restart or failover the script cache is empty and EVALSHA throws NOSCRIPT. Wrap the call and fall back to EVAL with the full body, then re-cache. We learned that one during a Redis maintenance window when every API request started 500ing at once.

Wiring It Into the Request Path

The limiter is only useful if it runs before the expensive work. In our stack the request flows through Cloudflare, then LiteSpeed, then PHP. Cloudflare handles the truly abusive volumetric stuff at the edge, but application-aware limits per API key have to live in PHP because only we know what an API key is allowed to do. The limiter runs as the first thing in the controller, before we ever touch SQLite:

<?php
// Front controller, before dispatching to any route handler
$clientId = resolveClientId(); // API key, or hashed IP for anonymous
$limiter  = new SlidingWindowLimiter($redis, limit: 120, windowMs: 60_000);
$verdict  = $limiter->check($clientId);

header('X-RateLimit-Limit: 120');
header('X-RateLimit-Remaining: ' . $verdict['remaining']);

if (!$verdict['allowed']) {
    $retrySec = (int) ceil($verdict['retryAfterMs'] / 1000);
    header('Retry-After: ' . $retrySec, true, 429);
    header('Content-Type: application/json');
    echo json_encode([
        'error'       => 'rate_limited',
        'retry_after' => $retrySec,
    ]);
    exit;
}

// Only now do we hit the database
$videos = $videoRepository->trending();

A few decisions worth calling out:

Resolve the client identity carefully. For authenticated traffic we key on the API key id. For anonymous traffic we hash the IP together with a daily salt so the key cannot be reversed but is still stable within a day. Never key on raw IP alone if you store it, and remember that behind Cloudflare the real client IP is in CF-Connecting-IP, not REMOTE_ADDR.
Emit standard headers. X-RateLimit-Remaining and Retry-After turn angry support tickets into self-service. Well-behaved clients back off on their own once they can see the numbers.
Fail open, not closed. If Redis is unreachable, we log and allow the request rather than taking the whole API down. A brief window of unlimited traffic beats a hard outage. That tradeoff depends on your threat model; for a public read API it is the right call.

The Constant-Memory Counter Variant

The log approach stores one ZSET member per request. For our anonymous edge tier that is millions of distinct keys and far too much memory. The sliding window counter approximates the same behavior with just two integers per key. The idea: keep a counter for the current fixed bucket and the previous one, then weight the previous bucket by the fraction of it still inside the sliding window.

Here it is in Python, which is what our analytics sidecar uses:

import time
import redis

r = redis.Redis(host="127.0.0.1", port=6379, decode_responses=True)

def sliding_counter_allow(client_id: str, limit: int, window: int) -> bool:
    now = time.time()
    current_bucket = int(now // window)
    prev_bucket = current_bucket - 1

    # How far into the current bucket are we? 0.0 -> 1.0
    elapsed = (now % window) / window
    prev_weight = 1.0 - elapsed

    cur_key = f"rl:cnt:{client_id}:{current_bucket}"
    prev_key = f"rl:cnt:{client_id}:{prev_bucket}"

    pipe = r.pipeline()
    pipe.get(prev_key)
    pipe.incr(cur_key)
    pipe.expire(cur_key, window * 2)
    prev_count, cur_count, _ = pipe.execute()

    prev_count = int(prev_count or 0)
    # Weighted estimate of requests in the sliding window
    estimate = prev_count * prev_weight + cur_count

    return estimate <= limit

The estimate is not exact. It assumes requests were spread evenly across the previous bucket, which is rarely true. In practice the error is small, Cloudflare's own rate limiting uses this exact technique, and a well-known analysis of production traffic found the approximation misclassified well under one percent of requests at typical limits. For protecting a database from scrapers, sub-one-percent fuzziness on the boundary is completely acceptable, and the memory savings are enormous: two small integers per key instead of a growing list of timestamps.

Note that the INCR happens before we know whether to allow, so a rejected request still counts toward the bucket. That is intentional. It means a client that keeps hammering after being limited keeps its own window pinned full, which is exactly the back-pressure you want against an abusive caller.

A Reusable Middleware in Go

We also front some of our heavier endpoints with a small Go proxy, and the same Lua script drops straight in. Sharing one script across PHP, Python, and Go means the limiting behavior is identical no matter which service the request hits, which matters because inconsistent limits across services are a debugging nightmare.

package ratelimit

import (
    "context"
    "crypto/rand"
    "encoding/hex"
    "net/http"
    "strconv"
    "time"

    "github.com/redis/go-redis/v9"
)

type Limiter struct {
    rdb    *redis.Client
    script *redis.Script
    limit  int
    window time.Duration
}

func New(rdb *redis.Client, limit int, window time.Duration, luaSrc string) *Limiter {
    return &Limiter{rdb: rdb, script: redis.NewScript(luaSrc), limit: limit, window: window}
}

func (l *Limiter) Middleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
        clientID := req.Header.Get("CF-Connecting-IP")
        nowMs := time.Now().UnixMilli()
        buf := make([]byte, 5)
        _, _ = rand.Read(buf)
        member := strconv.FormatInt(nowMs, 10) + ":" + hex.EncodeToString(buf)

        key := "rl:log:" + clientID
        // EvalSha with automatic fallback to EVAL is handled by redis.Script.Run
        res, err := l.script.Run(req.Context(), l.rdb,
            []string{key},
            nowMs, l.window.Milliseconds(), l.limit, member,
        ).Int64()

        if err != nil {
            // Fail open: Redis is down, let traffic through but log it
            next.ServeHTTP(w, req)
            return
        }

        if res < 0 {
            retry := strconv.Itoa(int((-res)/1000) + 1)
            w.Header().Set("Retry-After", retry)
            http.Error(w, `{"error":"rate_limited"}`, http.StatusTooManyRequests)
            return
        }

        w.Header().Set("X-RateLimit-Remaining", strconv.FormatInt(res, 10))
        next.ServeHTTP(w, req)
    })
}

The redis.Script.Run helper in go-redis tries EVALSHA and transparently falls back to EVAL on NOSCRIPT, which solves the restart problem I mentioned earlier without any extra code. I wish the PHP client did the same out of the box.

What Changed After We Shipped It

The results were immediate and boring, which is exactly what you want from infrastructure. The evening spikes flattened. Our SQLite FTS5 queries went back to single-digit milliseconds because they were no longer competing with scraper floods. The scrapers themselves either backed off when they started seeing 429 with Retry-After, or kept hammering and got nothing but rejections that never touched the database.

A few lessons that did not make it into the code but shaped how we run this:

Tune limits per endpoint, not globally. Our search endpoint hits FTS5 hard, so it gets a tighter limit than the static category listing that is mostly served from cache.
Monitor rejection rate, not just request rate. A sudden spike in 429s means either an attack or a limit that is too tight for a legitimate client. Both need eyes.
Keep the window short. A 1-minute window with a sane limit catches bursts fast. Long windows let a client front-load a huge burst and then go quiet.
Test the boundary explicitly. Write a test that sends the limit, waits half a window, and sends again. With a true sliding window the second batch should be partially rejected; with a fixed window it would all pass. That single test is what proved our implementation was actually sliding.

Conclusion

Rate limiting sounds like a solved problem until your endpoint is on fire and the fixed-window counter you copied from a tutorial is letting double the traffic through at every boundary. The sliding window log with Redis sorted sets gives you exact limiting with a single atomic Lua call, and the counter variant gives you constant memory when you have too many keys to store logs. Pick based on whether precision or memory is your constraint, share one Lua script across every service so behavior stays consistent, always emit Retry-After, and decide deliberately whether you fail open or closed. The version running behind our Cloudflare and LiteSpeed stack today is maybe sixty lines of real code, and it turned our worst time of day back into our quietest.