How I handle Claude API rate limits without losing user messages (with code)

dev.to

How I handle Claude API rate limits without losing user messages (with code)

Rate limits hit at the worst possible moment. Your user is mid-conversation, they send a message, and instead of a response they get a 529 error.

Here's exactly how I handle this in SimplyLouie — a $2/month Claude wrapper with ~100 concurrent users.

The problem

Claude's API returns a few different errors you need to handle gracefully:

  • 529 — API overloaded (temporary, retry works)
  • 429 — Rate limit exceeded (need backoff)
  • 500 — Server error (retry once, then fail)
  • timeout — Connection dropped mid-stream

Without proper handling, all of these become "something went wrong" to the user, and they lose their message.

The naive approach (don't do this)

// ❌ This loses the user's message on any error
const response = await anthropic.messages.create({
  model: 'claude-opus-4-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: userMessage }]
});
Enter fullscreen mode Exit fullscreen mode

If this throws, you have no retry logic. The user's message is gone.

What I do instead

1. Queue messages before sending

Always save the user's message to the database BEFORE attempting the API call.

// ✅ Save first, send second
async function handleUserMessage(userId, message) {
  // Save immediately — this is the source of truth
  const savedMsg = await db.messages.create({
    userId,
    role: 'user',
    content: message,
    status: 'pending'
  });

  // Now attempt the API call
  try {
    const response = await sendWithRetry(message, userId);
    await db.messages.update(savedMsg.id, { status: 'delivered' });
    return response;
  } catch (err) {
    await db.messages.update(savedMsg.id, { status: 'failed' });
    throw err;
  }
}
Enter fullscreen mode Exit fullscreen mode

2. Exponential backoff for 529s

The 529 (overloaded) error is temporary. Retry with backoff:

async function sendWithRetry(message, userId, attempt = 0) {
  const MAX_ATTEMPTS = 3;
  const BASE_DELAY = 1000; // 1 second

  try {
    return await anthropic.messages.create({
      model: 'claude-opus-4-5',
      max_tokens: 1024,
      messages: buildHistory(userId, message)
    });
  } catch (err) {
    const isRetryable = err.status === 529 || err.status === 500;
    const shouldRetry = isRetryable && attempt < MAX_ATTEMPTS;

    if (shouldRetry) {
      const delay = BASE_DELAY * Math.pow(2, attempt); // 1s, 2s, 4s
      await sleep(delay);
      return sendWithRetry(message, userId, attempt + 1);
    }

    throw err; // Give up after MAX_ATTEMPTS
  }
}

const sleep = (ms) => new Promise(resolve => setTimeout(resolve, ms));
Enter fullscreen mode Exit fullscreen mode

3. Streaming with reconnect

For streaming responses, connections drop. Here's how to handle partial responses:

async function streamWithFallback(messages, res) {
  let partialResponse = '';

  try {
    const stream = await anthropic.messages.stream({
      model: 'claude-opus-4-5',
      max_tokens: 1024,
      messages
    });

    stream.on('text', (text) => {
      partialResponse += text;
      res.write(`data: ${JSON.stringify({ text })}\n\n`);
    });

    await stream.finalMessage();
    res.write('data: [DONE]\n\n');

  } catch (err) {
    if (partialResponse.length > 0) {
      // We got SOMETHING — save the partial and tell the user
      res.write(`data: ${JSON.stringify({ 
        text: '\n\n_[Response was cut short. Here\'s what I got:]_',
        partial: true 
      })}\n\n`);
      res.write('data: [DONE]\n\n');
    } else {
      // Nothing at all — clean error
      res.write(`data: ${JSON.stringify({ error: 'API temporarily unavailable' })}\n\n`);
      res.write('data: [DONE]\n\n');
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

4. Show rate limit state to users

Don't make users guess. If you're rate limited, tell them:

// Track rate limit state per user
const rateLimitState = new Map();

function isRateLimited(userId) {
  const state = rateLimitState.get(userId);
  if (!state) return false;
  return Date.now() < state.resetAt;
}

function setRateLimited(userId, retryAfterSeconds) {
  rateLimitState.set(userId, {
    resetAt: Date.now() + (retryAfterSeconds * 1000)
  });
}

// In your route handler:
if (isRateLimited(req.user.id)) {
  const state = rateLimitState.get(req.user.id);
  const seconds = Math.ceil((state.resetAt - Date.now()) / 1000);
  return res.status(429).json({
    error: `Rate limited. Try again in ${seconds} seconds.`
  });
}
Enter fullscreen mode Exit fullscreen mode

The full error taxonomy

Error Status Retryable? User message
Overloaded 529 Yes (3x) "Busy, retrying..."
Rate limited 429 After backoff "Too many requests"
Server error 500 Once "Server hiccup, retrying"
Auth error 401 No Log + alert dev
Bad request 400 No Check your code
Timeout N/A Yes (2x) "Slow response, retrying"

What this looks like in production

With this setup, SimplyLouie's 529 rate is ~2% and user-visible failures are under 0.1%. Most users never see an error because the retry logic handles it silently.

The key insight: save before send. A user's message is precious. If the API fails, you can always retry. If you never saved the message, it's gone forever.


Building on Claude API? I run SimplyLouie — $2/month flat-rate Claude access, no token counting. The developer API is at simplylouie.com/developers.

What's your worst Claude API failure story? Drop it in the comments — I've seen some wild ones.

Source: dev.to

arrow_back Back to News