My API started the week serving 200 req/s. By Thursday it was getting hammered with 2,000. No warning, no DDoS — just organic traffic from a Reddit thread I didn't know existed.
The first sign was at 2 AM: Slack alerts, 503s, CPU pegged. My Node.js API — Express + TypeScript, deployed on Railway — was returning errors to real users.
I fixed it in 45 minutes. Here's exactly what I implemented.
What Was Actually Breaking
The API had no rate limiting at all. It hit a Postgres database on every request. Under normal load, that was fine. Under 10x load, the connection pool maxed out and every new request either timed out or got a "too many connections" error.
The problem wasn't the traffic. The problem was I had no defense layer.
The Fix: Three Layers
Layer 1 — express-rate-limit at the edge
import rateLimit from 'express-rate-limit';
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 req per IP per minute
standardHeaders: true,
legacyHeaders: false,
message: { error: 'Too many requests, slow down.' },
});
app.use('/api/', limiter);
This alone would have cut 80% of the damage. No IP should hit your API 2,000 times a minute unless something is wrong.
Layer 2 — Redis-backed sliding window for authenticated routes
Per-IP limits are coarse. For authenticated endpoints, you want per-user limits that persist across instances.
import { RateLimiterRedis } from 'rate-limiter-flexible';
import { createClient } from 'redis';
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
const userLimiter = new RateLimiterRedis({
storeClient: redisClient,
keyPrefix: 'rl_user',
points: 500, // 500 requests
duration: 3600, // per hour
blockDuration: 600, // block 10 min if exceeded
});
async function rateLimitUser(req: Request, res: Response, next: NextFunction) {
const userId = req.user?.id ?? req.ip;
try {
await userLimiter.consume(userId);
next();
} catch {
res.status(429).json({ error: 'Rate limit exceeded', retryAfter: 600 });
}
}
app.use('/api/ai/', rateLimitUser); // Only on expensive routes
The key insight: don't rate-limit everything the same way. Read endpoints can be generous. Write endpoints and AI-backed routes should be tight.
Layer 3 — Connection pool cap
The Postgres pool was set to the default (10 connections). Under load, every connection held for 200ms meant 50 req/s was the real ceiling.
import { Pool } from 'pg';
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // match your Postgres max_connections / num_workers
idleTimeoutMillis: 10000,
connectionTimeoutMillis: 3000, // fail fast, don't queue forever
});
Setting connectionTimeoutMillis is underrated. Without it, requests pile up waiting for a connection and your memory climbs until you OOM.
What the Logs Showed After
Before: p99 latency spiking to 8,000ms. Errors at 12% of requests.
After: p99 at 180ms. Zero 503s. The Reddit traffic eventually peaked and fell off. The API served all of it.
The Part Nobody Puts in Blog Posts
Rate limiting only works if you return proper 429 responses with Retry-After headers. Clients that don't respect 429 are bots. Clients that do — like well-built mobile apps and CLIs — will back off and retry cleanly.
Build for the good clients. Block the bad ones hard.
One More Thing
The night this happened I had no runbook, no alerting threshold configured for error rate. I got paged by Slack when it was already on fire.
The rate limiter was 45 minutes of work. The alerting configuration I added after was another 20. Neither of these is complicated. They're just the unglamorous work that doesn't ship as a feature.
If you're building Node.js APIs and want a production-ready TypeScript starter with rate limiting, auth, and Stripe already wired — Ship Fast Skill Pack ships it in an afternoon.
More AI tools → whoffagents.com