How I handle Claude API rate limits without losing user messages (with code)
Rate limits hit at the worst possible moment. Your user is mid-conversation, they send a message, and instead of a response they get a 529 error.
Here's exactly how I handle this in SimplyLouie — a $2/month Claude wrapper with ~100 concurrent users.
The problem
Claude's API returns a few different errors you need to handle gracefully:
-
529— API overloaded (temporary, retry works) -
429— Rate limit exceeded (need backoff) -
500— Server error (retry once, then fail) -
timeout— Connection dropped mid-stream
Without proper handling, all of these become "something went wrong" to the user, and they lose their message.
The naive approach (don't do this)
// ❌ This loses the user's message on any error
const response = await anthropic.messages.create({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: userMessage }]
});
If this throws, you have no retry logic. The user's message is gone.
What I do instead
1. Queue messages before sending
Always save the user's message to the database BEFORE attempting the API call.
// ✅ Save first, send second
async function handleUserMessage(userId, message) {
// Save immediately — this is the source of truth
const savedMsg = await db.messages.create({
userId,
role: 'user',
content: message,
status: 'pending'
});
// Now attempt the API call
try {
const response = await sendWithRetry(message, userId);
await db.messages.update(savedMsg.id, { status: 'delivered' });
return response;
} catch (err) {
await db.messages.update(savedMsg.id, { status: 'failed' });
throw err;
}
}
2. Exponential backoff for 529s
The 529 (overloaded) error is temporary. Retry with backoff:
async function sendWithRetry(message, userId, attempt = 0) {
const MAX_ATTEMPTS = 3;
const BASE_DELAY = 1000; // 1 second
try {
return await anthropic.messages.create({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages: buildHistory(userId, message)
});
} catch (err) {
const isRetryable = err.status === 529 || err.status === 500;
const shouldRetry = isRetryable && attempt < MAX_ATTEMPTS;
if (shouldRetry) {
const delay = BASE_DELAY * Math.pow(2, attempt); // 1s, 2s, 4s
await sleep(delay);
return sendWithRetry(message, userId, attempt + 1);
}
throw err; // Give up after MAX_ATTEMPTS
}
}
const sleep = (ms) => new Promise(resolve => setTimeout(resolve, ms));
3. Streaming with reconnect
For streaming responses, connections drop. Here's how to handle partial responses:
async function streamWithFallback(messages, res) {
let partialResponse = '';
try {
const stream = await anthropic.messages.stream({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages
});
stream.on('text', (text) => {
partialResponse += text;
res.write(`data: ${JSON.stringify({ text })}\n\n`);
});
await stream.finalMessage();
res.write('data: [DONE]\n\n');
} catch (err) {
if (partialResponse.length > 0) {
// We got SOMETHING — save the partial and tell the user
res.write(`data: ${JSON.stringify({
text: '\n\n_[Response was cut short. Here\'s what I got:]_',
partial: true
})}\n\n`);
res.write('data: [DONE]\n\n');
} else {
// Nothing at all — clean error
res.write(`data: ${JSON.stringify({ error: 'API temporarily unavailable' })}\n\n`);
res.write('data: [DONE]\n\n');
}
}
}
4. Show rate limit state to users
Don't make users guess. If you're rate limited, tell them:
// Track rate limit state per user
const rateLimitState = new Map();
function isRateLimited(userId) {
const state = rateLimitState.get(userId);
if (!state) return false;
return Date.now() < state.resetAt;
}
function setRateLimited(userId, retryAfterSeconds) {
rateLimitState.set(userId, {
resetAt: Date.now() + (retryAfterSeconds * 1000)
});
}
// In your route handler:
if (isRateLimited(req.user.id)) {
const state = rateLimitState.get(req.user.id);
const seconds = Math.ceil((state.resetAt - Date.now()) / 1000);
return res.status(429).json({
error: `Rate limited. Try again in ${seconds} seconds.`
});
}
The full error taxonomy
| Error | Status | Retryable? | User message |
|---|---|---|---|
| Overloaded | 529 | Yes (3x) | "Busy, retrying..." |
| Rate limited | 429 | After backoff | "Too many requests" |
| Server error | 500 | Once | "Server hiccup, retrying" |
| Auth error | 401 | No | Log + alert dev |
| Bad request | 400 | No | Check your code |
| Timeout | N/A | Yes (2x) | "Slow response, retrying" |
What this looks like in production
With this setup, SimplyLouie's 529 rate is ~2% and user-visible failures are under 0.1%. Most users never see an error because the retry logic handles it silently.
The key insight: save before send. A user's message is precious. If the API fails, you can always retry. If you never saved the message, it's gone forever.
Building on Claude API? I run SimplyLouie — $2/month flat-rate Claude access, no token counting. The developer API is at simplylouie.com/developers.
What's your worst Claude API failure story? Drop it in the comments — I've seen some wild ones.