Background Job Patterns for AI in Rails — Retries, Rate Limiting, Dead Letters, and Priorities

ruby dev.to

AI workloads are different from typical web jobs. They're slow, expensive, rate-limited, and sometimes they just fail for no good reason. Your background job setup needs to handle all of that gracefully.

This post covers the patterns that keep your AI-powered Rails app reliable under real-world conditions.

The Problem with AI Jobs

A typical Rails background job takes milliseconds. An AI job might take 30 seconds. Or timeout after 120 seconds. Or get rate-limited by the API provider.

You need patterns for:

  • Retries that don't hammer a rate-limited API
  • Rate limiting that respects provider quotas
  • Dead letter queues for jobs that fail permanently
  • Priority queues so chat responses don't wait behind batch embeddings

Setting Up Solid Queue

Rails 8 ships with Solid Queue as the default. No Redis required — it uses your database.

bin/rails solid_queue:install
bin/rails db:migrate
Enter fullscreen mode Exit fullscreen mode
# config/solid_queue.yml
production:
  dispatchers:
    - polling_interval: 1
      batch_size: 500
  workers:
    - queues: [critical, default, ai_realtime, ai_batch, embeddings]
      threads: 5
      polling_interval: 0.5
Enter fullscreen mode Exit fullscreen mode

Priority Queues

Not all AI jobs are equal. A user waiting for a chat response needs priority over a batch embedding job running overnight.

# app/jobs/ai_chat_job.rb
class AiChatJob < ApplicationJob
  queue_as :ai_realtime

  def perform(conversation_id, message)
    conversation = Conversation.find(conversation_id)
    response = OpenAI::Client.new.chat(
      parameters: {
        model: "gpt-4o",
        messages: conversation.messages_for_api + [{ role: "user", content: message }]
      }
    )
    conversation.messages.create!(
      role: "assistant",
      content: response.dig("choices", 0, "message", "content")
    )
  end
end

# app/jobs/batch_embedding_job.rb
class BatchEmbeddingJob < ApplicationJob
  queue_as :embeddings

  def perform(document_id)
    document = Document.find(document_id)
    embedding = OpenAI::Client.new.embeddings(
      parameters: { model: "text-embedding-3-small", input: document.content }
    )
    document.update!(embedding: embedding.dig("data", 0, "embedding"))
  end
end
Enter fullscreen mode Exit fullscreen mode

Configure queue priority order:

# config/solid_queue.yml
production:
  workers:
    - queues: [ai_realtime]
      threads: 3
      polling_interval: 0.1
    - queues: [critical, default]
      threads: 3
      polling_interval: 0.5
    - queues: [ai_batch, embeddings]
      threads: 2
      polling_interval: 2
Enter fullscreen mode Exit fullscreen mode

Separate workers mean your realtime queue gets dedicated threads and faster polling. Batch jobs can wait.

Smart Retries with Exponential Backoff

The default retry strategy will destroy your API quota. Use exponential backoff with jitter:

class AiBaseJob < ApplicationJob
  retry_on Faraday::TooManyRequestsError,
           wait: :polynomially_longer,
           attempts: 5

  retry_on Faraday::TimeoutError,
           wait: :polynomially_longer,
           attempts: 3

  retry_on OpenAI::Error,
           wait: 10.seconds,
           attempts: 3

  discard_on ActiveRecord::RecordNotFound
end
Enter fullscreen mode Exit fullscreen mode

polynomially_longer gives you waits of ~3s, ~18s, ~83s, ~258s — enough time for rate limits to clear.

For custom backoff logic:

class EmbeddingJob < AiBaseJob
  retry_on RateLimitError, attempts: 5 do |job, error|
    retry_after = error.response&.headers&.dig("retry-after")&.to_i || 60
    job.class.set(wait: retry_after.seconds).perform_later(*job.arguments)
    Rails.logger.warn("Rate limited. Retrying in #{retry_after}s: #{job.job_id}")
  end

  def perform(document_id)
    # ...
  end
end
Enter fullscreen mode Exit fullscreen mode

This reads the Retry-After header from the API response and waits exactly as long as the provider tells you to.

Rate Limiting at the Job Level

Don't rely on the API to rate-limit you. Do it yourself:

# app/jobs/concerns/rate_limited.rb
module RateLimited
  extend ActiveSupport::Concern

  class_methods do
    def rate_limit(max_per_minute:)
      before_perform do |job|
        key = "rate_limit:#{job.class.name}"
        count = Rails.cache.increment(key, 1, expires_in: 1.minute, raw: true)

        if count.to_i > max_per_minute
          # Re-enqueue with delay instead of failing
          job.class.set(wait: 30.seconds).perform_later(*job.arguments)
          throw :abort
        end
      end
    end
  end
end

# Usage
class EmbeddingJob < AiBaseJob
  include RateLimited
  rate_limit max_per_minute: 50

  def perform(document_id)
    # ...
  end
end
Enter fullscreen mode Exit fullscreen mode

Dead Letter Handling

When a job exhausts all retries, don't lose it silently. Capture it:

class AiBaseJob < ApplicationJob
  after_discard do |job, error|
    DeadLetter.create!(
      job_class: job.class.name,
      arguments: job.arguments.to_json,
      error_class: error.class.name,
      error_message: error.message,
      failed_at: Time.current
    )

    Rails.logger.error(
      "Dead letter: #{job.class.name} " \
      "args=#{job.arguments.inspect} " \
      "error=#{error.class}: #{error.message}"
    )
  end
end
Enter fullscreen mode Exit fullscreen mode

Create the dead letter table:

bin/rails generate model DeadLetter \
  job_class:string \
  arguments:text \
  error_class:string \
  error_message:text \
  failed_at:datetime \
  retried_at:datetime
bin/rails db:migrate
Enter fullscreen mode Exit fullscreen mode

Now you can build an admin page to inspect and retry failed jobs:

# Retry a dead letter
dead_letter = DeadLetter.find(params[:id])
dead_letter.job_class.constantize.perform_later(
  *JSON.parse(dead_letter.arguments)
)
dead_letter.update!(retried_at: Time.current)
Enter fullscreen mode Exit fullscreen mode

Timeouts

AI API calls can hang. Always set timeouts:

class AiBaseJob < ApplicationJob
  around_perform do |job, block|
    Timeout.timeout(120) { block.call }
  rescue Timeout::Error
    Rails.logger.error("Job timed out: #{job.class.name}#{job.arguments}")
    raise # Let retry logic handle it
  end
end
Enter fullscreen mode Exit fullscreen mode

Or configure at the HTTP client level (preferred):

client = OpenAI::Client.new(
  request_timeout: 60,
  uri_base: "https://api.openai.com/"
)
Enter fullscreen mode Exit fullscreen mode

Idempotency

AI jobs should be safe to retry. Use idempotency keys:

class EmbeddingJob < AiBaseJob
  def perform(document_id)
    document = Document.find(document_id)

    # Skip if already processed
    return if document.embedding.present? && document.embedded_at > 1.hour.ago

    embedding = generate_embedding(document.content)
    document.update!(embedding: embedding, embedded_at: Time.current)
  end
end
Enter fullscreen mode Exit fullscreen mode

Putting It Together

Here's the architecture:

User request → AiChatJob (ai_realtime queue, 0.1s poll, 3 threads)
Batch import → BatchEmbeddingJob (embeddings queue, 2s poll, 2 threads)
All AI jobs → inherit AiBaseJob (retries, timeouts, dead letters)
Rate limiting → RateLimited concern (per-class limits)
Failures → DeadLetter table (inspect, retry from admin)
Enter fullscreen mode Exit fullscreen mode

Your AI features are only as reliable as your job infrastructure. Get this right, and your users never see a failed generation — just a slightly delayed one.

Next up: monitoring and observability for AI-powered Rails apps.

Source: dev.to

arrow_back Back to Tutorials