Background Job Patterns for AI in Rails — Retries, Rate Limiting, Dead Letters, and Priorities

AI workloads are different from typical web jobs. They're slow, expensive, rate-limited, and sometimes they just fail for no good reason. Your background job setup needs to handle all of that gracefully.

This post covers the patterns that keep your AI-powered Rails app reliable under real-world conditions.

The Problem with AI Jobs

A typical Rails background job takes milliseconds. An AI job might take 30 seconds. Or timeout after 120 seconds. Or get rate-limited by the API provider.

You need patterns for:

Retries that don't hammer a rate-limited API
Rate limiting that respects provider quotas
Dead letter queues for jobs that fail permanently
Priority queues so chat responses don't wait behind batch embeddings

Setting Up Solid Queue

Rails 8 ships with Solid Queue as the default. No Redis required — it uses your database.

bin/rails solid_queue:install
bin/rails db:migrate

# config/solid_queue.yml
production:
  dispatchers:
    - polling_interval: 1
      batch_size: 500
  workers:
    - queues: [critical, default, ai_realtime, ai_batch, embeddings]
      threads: 5
      polling_interval: 0.5

Priority Queues

Not all AI jobs are equal. A user waiting for a chat response needs priority over a batch embedding job running overnight.

# app/jobs/ai_chat_job.rb
class AiChatJob < ApplicationJob
  queue_as :ai_realtime

  def perform(conversation_id, message)
    conversation = Conversation.find(conversation_id)
    response = OpenAI::Client.new.chat(
      parameters: {
        model: "gpt-4o",
        messages: conversation.messages_for_api + [{ role: "user", content: message }]
      }
    )
    conversation.messages.create!(
      role: "assistant",
      content: response.dig("choices", 0, "message", "content")
    )
  end
end

# app/jobs/batch_embedding_job.rb
class BatchEmbeddingJob < ApplicationJob
  queue_as :embeddings

  def perform(document_id)
    document = Document.find(document_id)
    embedding = OpenAI::Client.new.embeddings(
      parameters: { model: "text-embedding-3-small", input: document.content }
    )
    document.update!(embedding: embedding.dig("data", 0, "embedding"))
  end
end

Configure queue priority order:

# config/solid_queue.yml
production:
  workers:
    - queues: [ai_realtime]
      threads: 3
      polling_interval: 0.1
    - queues: [critical, default]
      threads: 3
      polling_interval: 0.5
    - queues: [ai_batch, embeddings]
      threads: 2
      polling_interval: 2

Separate workers mean your realtime queue gets dedicated threads and faster polling. Batch jobs can wait.

Smart Retries with Exponential Backoff

The default retry strategy will destroy your API quota. Use exponential backoff with jitter:

class AiBaseJob < ApplicationJob
  retry_on Faraday::TooManyRequestsError,
           wait: :polynomially_longer,
           attempts: 5

  retry_on Faraday::TimeoutError,
           wait: :polynomially_longer,
           attempts: 3

  retry_on OpenAI::Error,
           wait: 10.seconds,
           attempts: 3

  discard_on ActiveRecord::RecordNotFound
end

polynomially_longer gives you waits of ~3s, ~18s, ~83s, ~258s — enough time for rate limits to clear.

For custom backoff logic:

class EmbeddingJob < AiBaseJob
  retry_on RateLimitError, attempts: 5 do |job, error|
    retry_after = error.response&.headers&.dig("retry-after")&.to_i || 60
    job.class.set(wait: retry_after.seconds).perform_later(*job.arguments)
    Rails.logger.warn("Rate limited. Retrying in #{retry_after}s: #{job.job_id}")
  end

  def perform(document_id)
    # ...
  end
end

This reads the Retry-After header from the API response and waits exactly as long as the provider tells you to.

Rate Limiting at the Job Level

Don't rely on the API to rate-limit you. Do it yourself:

# app/jobs/concerns/rate_limited.rb
module RateLimited
  extend ActiveSupport::Concern

  class_methods do
    def rate_limit(max_per_minute:)
      before_perform do |job|
        key = "rate_limit:#{job.class.name}"
        count = Rails.cache.increment(key, 1, expires_in: 1.minute, raw: true)

        if count.to_i > max_per_minute
          # Re-enqueue with delay instead of failing
          job.class.set(wait: 30.seconds).perform_later(*job.arguments)
          throw :abort
        end
      end
    end
  end
end

# Usage
class EmbeddingJob < AiBaseJob
  include RateLimited
  rate_limit max_per_minute: 50

  def perform(document_id)
    # ...
  end
end

Dead Letter Handling

When a job exhausts all retries, don't lose it silently. Capture it:

class AiBaseJob < ApplicationJob
  after_discard do |job, error|
    DeadLetter.create!(
      job_class: job.class.name,
      arguments: job.arguments.to_json,
      error_class: error.class.name,
      error_message: error.message,
      failed_at: Time.current
    )

    Rails.logger.error(
      "Dead letter: #{job.class.name} " \
      "args=#{job.arguments.inspect} " \
      "error=#{error.class}: #{error.message}"
    )
  end
end

Create the dead letter table:

bin/rails generate model DeadLetter \
  job_class:string \
  arguments:text \
  error_class:string \
  error_message:text \
  failed_at:datetime \
  retried_at:datetime
bin/rails db:migrate

Now you can build an admin page to inspect and retry failed jobs:

# Retry a dead letter
dead_letter = DeadLetter.find(params[:id])
dead_letter.job_class.constantize.perform_later(
  *JSON.parse(dead_letter.arguments)
)
dead_letter.update!(retried_at: Time.current)

Timeouts

AI API calls can hang. Always set timeouts:

class AiBaseJob < ApplicationJob
  around_perform do |job, block|
    Timeout.timeout(120) { block.call }
  rescue Timeout::Error
    Rails.logger.error("Job timed out: #{job.class.name}#{job.arguments}")
    raise # Let retry logic handle it
  end
end

Or configure at the HTTP client level (preferred):

client = OpenAI::Client.new(
  request_timeout: 60,
  uri_base: "https://api.openai.com/"
)

Idempotency

AI jobs should be safe to retry. Use idempotency keys:

class EmbeddingJob < AiBaseJob
  def perform(document_id)
    document = Document.find(document_id)

    # Skip if already processed
    return if document.embedding.present? && document.embedded_at > 1.hour.ago

    embedding = generate_embedding(document.content)
    document.update!(embedding: embedding, embedded_at: Time.current)
  end
end

Putting It Together

Here's the architecture:

User request → AiChatJob (ai_realtime queue, 0.1s poll, 3 threads)
Batch import → BatchEmbeddingJob (embeddings queue, 2s poll, 2 threads)
All AI jobs → inherit AiBaseJob (retries, timeouts, dead letters)
Rate limiting → RateLimited concern (per-class limits)
Failures → DeadLetter table (inspect, retry from admin)

Your AI features are only as reliable as your job infrastructure. Get this right, and your users never see a failed generation — just a slightly delayed one.

Next up: monitoring and observability for AI-powered Rails apps.