How I handle Claude API rate limits without losing user messages (with code)

Rate limits hit at the worst possible moment. Your user is mid-conversation, they send a message, and instead of a response they get a 529 error.

Here's exactly how I handle this in SimplyLouie — a $2/month Claude wrapper with ~100 concurrent users.

The problem

Claude's API returns a few different errors you need to handle gracefully:

529 — API overloaded (temporary, retry works)
429 — Rate limit exceeded (need backoff)
500 — Server error (retry once, then fail)
timeout — Connection dropped mid-stream

Without proper handling, all of these become "something went wrong" to the user, and they lose their message.

The naive approach (don't do this)

// ❌ This loses the user's message on any error
const response = await anthropic.messages.create({
  model: 'claude-opus-4-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: userMessage }]
});

If this throws, you have no retry logic. The user's message is gone.

What I do instead

1. Queue messages before sending

Always save the user's message to the database BEFORE attempting the API call.

// ✅ Save first, send second
async function handleUserMessage(userId, message) {
  // Save immediately — this is the source of truth
  const savedMsg = await db.messages.create({
    userId,
    role: 'user',
    content: message,
    status: 'pending'
  });

  // Now attempt the API call
  try {
    const response = await sendWithRetry(message, userId);
    await db.messages.update(savedMsg.id, { status: 'delivered' });
    return response;
  } catch (err) {
    await db.messages.update(savedMsg.id, { status: 'failed' });
    throw err;
  }
}

2. Exponential backoff for 529s

The 529 (overloaded) error is temporary. Retry with backoff:

async function sendWithRetry(message, userId, attempt = 0) {
  const MAX_ATTEMPTS = 3;
  const BASE_DELAY = 1000; // 1 second

  try {
    return await anthropic.messages.create({
      model: 'claude-opus-4-5',
      max_tokens: 1024,
      messages: buildHistory(userId, message)
    });
  } catch (err) {
    const isRetryable = err.status === 529 || err.status === 500;
    const shouldRetry = isRetryable && attempt < MAX_ATTEMPTS;

    if (shouldRetry) {
      const delay = BASE_DELAY * Math.pow(2, attempt); // 1s, 2s, 4s
      await sleep(delay);
      return sendWithRetry(message, userId, attempt + 1);
    }

    throw err; // Give up after MAX_ATTEMPTS
  }
}

const sleep = (ms) => new Promise(resolve => setTimeout(resolve, ms));

3. Streaming with reconnect

For streaming responses, connections drop. Here's how to handle partial responses:

async function streamWithFallback(messages, res) {
  let partialResponse = '';

  try {
    const stream = await anthropic.messages.stream({
      model: 'claude-opus-4-5',
      max_tokens: 1024,
      messages
    });

    stream.on('text', (text) => {
      partialResponse += text;
      res.write(`data: ${JSON.stringify({ text })}\n\n`);
    });

    await stream.finalMessage();
    res.write('data: [DONE]\n\n');

  } catch (err) {
    if (partialResponse.length > 0) {
      // We got SOMETHING — save the partial and tell the user
      res.write(`data: ${JSON.stringify({ 
        text: '\n\n_[Response was cut short. Here\'s what I got:]_',
        partial: true 
      })}\n\n`);
      res.write('data: [DONE]\n\n');
    } else {
      // Nothing at all — clean error
      res.write(`data: ${JSON.stringify({ error: 'API temporarily unavailable' })}\n\n`);
      res.write('data: [DONE]\n\n');
    }
  }
}

4. Show rate limit state to users

Don't make users guess. If you're rate limited, tell them:

// Track rate limit state per user
const rateLimitState = new Map();

function isRateLimited(userId) {
  const state = rateLimitState.get(userId);
  if (!state) return false;
  return Date.now() < state.resetAt;
}

function setRateLimited(userId, retryAfterSeconds) {
  rateLimitState.set(userId, {
    resetAt: Date.now() + (retryAfterSeconds * 1000)
  });
}

// In your route handler:
if (isRateLimited(req.user.id)) {
  const state = rateLimitState.get(req.user.id);
  const seconds = Math.ceil((state.resetAt - Date.now()) / 1000);
  return res.status(429).json({
    error: `Rate limited. Try again in ${seconds} seconds.`
  });
}

The full error taxonomy

Error	Status	Retryable?	User message
Overloaded	529	Yes (3x)	"Busy, retrying..."
Rate limited	429	After backoff	"Too many requests"
Server error	500	Once	"Server hiccup, retrying"
Auth error	401	No	Log + alert dev
Bad request	400	No	Check your code
Timeout	N/A	Yes (2x)	"Slow response, retrying"

What this looks like in production

With this setup, SimplyLouie's 529 rate is ~2% and user-visible failures are under 0.1%. Most users never see an error because the retry logic handles it silently.

The key insight: save before send. A user's message is precious. If the API fails, you can always retry. If you never saved the message, it's gone forever.

Building on Claude API? I run SimplyLouie — $2/month flat-rate Claude access, no token counting. The developer API is at simplylouie.com/developers.

What's your worst Claude API failure story? Drop it in the comments — I've seen some wild ones.