Production Webhook Queue with BullMQ and Redis: The Pattern That Actually Scales
Every SaaS eventually breaks when a webhook handler times out. Here's the architecture that prevents it.
Why Naive Webhook Handlers Fail
The typical webhook setup:
// ❌ This will hurt you in production
export async function POST(req: Request) {
const event = await req.json();
// These can take 5-30 seconds:
await sendWelcomeEmail(event.user);
await provisionStripeCustomer(event.user);
await createDefaultWorkspace(event.user);
await syncToCRM(event.user);
return Response.json({ ok: true });
}
Problems:
- Stripe retries after 30s if no response. You'll process the same event 3 times.
- Any single step failing kills the whole flow with no retry.
- No visibility into what failed or why.
- Vercel functions time out at 10s on hobby plans.
The Queue Architecture
Webhook arrives → Validate signature → Enqueue job → Return 200 immediately
↓
BullMQ worker processes async:
- Retry on failure (exponential backoff)
- Dead-letter queue for permanent failures
- Full job history for debugging
Implementation
1. Queue Setup
// lib/queue.ts
import { Queue, Worker, QueueEvents } from 'bullmq';
import IORedis from 'ioredis';
const connection = new IORedis(process.env.REDIS_URL!, {
maxRetriesPerRequest: null, // required for BullMQ
});
export const webhookQueue = new Queue('webhooks', {
connection,
defaultJobOptions: {
attempts: 5,
backoff: {
type: 'exponential',
delay: 2000, // 2s, 4s, 8s, 16s, 32s
},
removeOnComplete: { count: 1000 }, // keep last 1000 for debugging
removeOnFail: { count: 5000 }, // keep failures longer
},
});
2. Webhook Route (Fast Path)
// app/api/webhooks/stripe/route.ts
import { webhookQueue } from '@/lib/queue';
import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
export async function POST(req: Request) {
const body = await req.text();
const sig = req.headers.get('stripe-signature')!;
let event: Stripe.Event;
try {
event = stripe.webhooks.constructEvent(
body,
sig,
process.env.STRIPE_WEBHOOK_SECRET!
);
} catch {
return Response.json({ error: 'Invalid signature' }, { status: 400 });
}
// Enqueue and return immediately — don't await processing
await webhookQueue.add(
event.type, // job name for filtering
{ event }, // job data
{ jobId: event.id } // idempotency: same event = same job ID = no duplicate
);
return Response.json({ received: true }); // Stripe sees 200 in <100ms
}
3. Worker (Actual Processing)
// workers/webhook-worker.ts
import { Worker } from 'bullmq';
import { connection } from '@/lib/queue';
const worker = new Worker(
'webhooks',
async (job) => {
const { event } = job.data;
switch (event.type) {
case 'checkout.session.completed': {
const session = event.data.object as Stripe.Checkout.Session;
await handleCheckoutCompleted(session);
break;
}
case 'customer.subscription.deleted': {
const sub = event.data.object as Stripe.Subscription;
await handleSubscriptionCancelled(sub);
break;
}
default:
// Log unknown events for observability
console.log(`Unhandled event type: ${event.type}`);
}
},
{
connection,
concurrency: 5, // process 5 jobs simultaneously
}
);
worker.on('failed', (job, err) => {
console.error(`Job ${job?.id} failed:`, err.message);
// Alert to Slack/PagerDuty after max retries
if (job?.attemptsMade === job?.opts.attempts) {
notifyPermanentFailure(job);
}
});
4. Idempotent Handlers
Even with queuing, design handlers to be safe to run twice:
async function handleCheckoutCompleted(session: Stripe.Checkout.Session) {
const userId = session.metadata?.userId;
if (!userId) throw new Error('Missing userId in metadata');
// Upsert — safe to run multiple times
await db.subscription.upsert({
where: { stripeCustomerId: session.customer as string },
create: {
userId,
stripeCustomerId: session.customer as string,
status: 'active',
plan: session.metadata?.plan ?? 'starter',
},
update: {
status: 'active',
},
});
// Check before sending — don't double-email
const alreadySent = await db.emailLog.findUnique({
where: { key: `welcome-${userId}` },
});
if (!alreadySent) {
await sendWelcomeEmail(userId);
await db.emailLog.create({ data: { key: `welcome-${userId}` } });
}
}
Monitoring With Bull Board
// app/api/admin/queues/route.ts
import { createBullBoard } from '@bull-board/api';
import { BullMQAdapter } from '@bull-board/api/bullMQAdapter';
import { ExpressAdapter } from '@bull-board/express';
// Gives you a UI dashboard at /admin/queues
// Shows pending, active, completed, failed jobs
// Retry failed jobs with one click
The Deployment Stack
- Redis: Upstash (serverless, $0 for low volume) or Railway Redis
- Worker: Long-running Node.js process (Railway, Fly.io, or a dedicated EC2)
- Queue dashboard: Bull Board behind auth middleware
-
Alerts: Worker
failedevent → Slack webhook for permanent failures
This architecture handles thousands of webhooks per minute and survives any downstream outage gracefully.
Skip the Infrastructure Setup
The Workflow Automator MCP includes pre-built queue patterns, Stripe webhook templates, and BullMQ worker configurations — deploy a production-grade webhook system in an afternoon instead of a week.
Building on BullMQ? What pattern have you found most useful? Comments below.