Originally published at claudeguide.io/claude-api-ruby-rails-guide

Claude API Ruby on Rails: Complete Guide (2026)

You can call the Claude API from Ruby on Rails using the official anthropic gem — add it to your Gemfile, set ANTHROPIC_API_KEY, and get a response in under 10 lines of code. The gem supports message creation, streaming via Server-Sent Events, tool use, and prompt caching. This guide covers gem installation, service objects, ActionController::Live streaming, ActiveJob background processing, Rails API controller patterns, error handling with retries, and rate limit strategies.

Gem Installation

Add to your Gemfile:

# Gemfile
gem "anthropic"

# Optional: for HTTP-level retry and timeout control
gem "faraday-retry"

Then install:

bundle install

Set your API key as an environment variable. Never hardcode it:

# .env (use dotenv-rails in development)
ANTHROPIC_API_KEY=sk-ant-...

For production on Heroku, Fly.io, or Render, set the env var in the platform dashboard. In Rails credentials:

rails credentials:edit
# Add: anthropic_api_key: sk-ant-...

Reference it in your initializer:

# config/initializers/anthropic.rb
Anthropic.configure do |config|
  config.access_token = ENV.fetch("ANTHROPIC_API_KEY") do
    Rails.application.credentials.anthropic_api_key
  end
end

Basic Message Creation

require "anthropic"

client = Anthropic::Client.new

response = client.messages(
  parameters: {
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages: [
      { role: "user", content: "Explain Rails ActiveRecord callbacks in one paragraph." }
    ]
  }
)

puts response.dig("content", 0, "text")
puts "Tokens used: #{response.dig("usage", "input_tokens")} in / #{response.dig("usage", "output_tokens")} out"

The client reads ANTHROPIC_API_KEY from your environment automatically once configured.

Service Object Pattern

Encapsulate API calls in a service object to keep controllers thin:

# app/services/claude_service.rb
class ClaudeService
  MODEL = "claude-sonnet-4-5"

  def initialize
    @client = Anthropic::Client.new
  end

  def chat(user_message, system_prompt: nil)
    params = {
      model: MODEL,
      max_tokens: 1024,
      messages: [{ role: "user", content: user_message }]
    }
    params[:system] = system_prompt if system_prompt

    response = @client.messages(parameters: params)
    response.dig("content", 0, "text")
  rescue Anthropic::Error =

---

## Streaming with ActionController::Live

Use Rails `ActionController::Live` and Server-Sent Events to stream tokens to the browser in real time:

ruby

app/controllers/api/v1/stream_controller.rb

class Api::V1::StreamController < ApplicationController
include ActionController::Live

def create
response.headers["Content-Type"] = "text/event-stream"
response.headers["Cache-Control"] = "no-cache"
response.headers["X-Accel-Buffering"] = "no"

client = Anthropic::Client.new

client.messages(
  parameters: {
    model: "claude-sonnet-4-5",
    max_tokens: 2048,
    stream: true,
    messages: [{ role: "user", content: params[:message] }]
  }
) do |chunk, _bytesize|
  if chunk["type"] == "content_block_delta"
    text = chunk.dig("delta", "text").to_s
    response.stream.write("data: #{text.to_json}\n\n") unless text.empty?
  end

  if chunk["type"] == "message_stop"
    response.stream.write("data: [DONE]\n\n")
  end
end

rescue ActionController::Live::ClientDisconnected
Rails.logger.info("[StreamController] Client disconnected")
ensure
response.stream.close
end
end


Add the route:

ruby

config/routes.rb

namespace :api do
namespace :v1 do
post "stream", to: "stream#create"
post "chat", to: "chat#create"
end
end


On the frontend, consume the stream with the `EventSource` API or `fetch` with a `ReadableStream`.

**Performance note**: First token arrives in 300–500ms with streaming. Without streaming, a 2,000-token response waits 5–12 seconds for the full payload. For any user-facing feature, always stream.

---

## ActiveJob Background Processing

For long-running or batch tasks, offload Claude calls to a background job:

ruby

app/jobs/claude_analysis_job.rb

class ClaudeAnalysisJob < ApplicationJob
queue_as :default

retry_on Anthropic::RateLimitError, wait: :polynomially_longer, attempts: 5
retry_on Anthropic::ServerError, wait: 10.seconds, attempts: 3
discard_on Anthropic::AuthenticationError

def perform(document_id)
document = Document.find(document_id)

client = Anthropic::Client.new
response = client.messages(
  parameters: {
    model: "claude-haiku-4-5",   # Use Haiku for batch tasks (10x cheaper)
    max_tokens: 512,
    messages: [
      {
        role: "user",
        content: "Summarize this document in 3 bullet points:\n\n#{document.body}"
      }
    ]
  }
)

summary = response.dig("content", 0, "text")
document.update!(summary: summary, summarized_at: Time.current)

end
end


Enqueue it from a controller or callback:

ruby
ClaudeAnalysisJob.perform_later(document.id)


Use Sidekiq or Solid Queue as the backend. For cost optimization, route simple summarization tasks to `claude-haiku-4-5` and reserve Sonnet for complex reasoning. See [Claude Haiku vs Sonnet vs Opus: Which Model](/claude-haiku-sonnet-opus-which-model) for detailed benchmarks.

---

## Error Handling with Retries

The `anthropic` gem raises typed errors you can rescue explicitly:

ruby

app/services/claude_service.rb (robust version)

class ClaudeService
MAX_RETRIES = 3
BASE_DELAY = 1.0 # seconds

def chat_with_retry(message)
attempts = 0
begin
attempts += 1
call_api(message)
rescue Anthropic::RateLimitError =

Frequently Asked Questions

Is there an official Anthropic Ruby gem?

Yes. Anthropic publishes the anthropic gem officially on RubyGems. Add gem "anthropic" to your Gemfile and run bundle install. It supports messages, streaming, tool use, and prompt caching. You can also call the REST API directly with Faraday or Net::HTTP if you prefer zero dependencies.

How do I stream Claude responses in Rails without blocking the main thread?

Use ActionController::Live with include ActionController::Live in your controller and write chunks to response.stream. Puma handles the connection in a separate thread. Ensure your web server is configured for streaming (disable response buffering with X-Accel-Buffering: no for Nginx). Close the stream in an ensure block to prevent connection leaks.

Which model should I use for Rails background jobs?

For high-volume batch tasks (summarization, classification, extraction), use claude-haiku-4-5 — it is roughly 10x cheaper than Sonnet with comparable quality for straightforward tasks. Reserve claude-sonnet-4-5 for complex reasoning or code generation. See Claude Haiku vs Sonnet vs Opus: Which Model for a decision framework.

How do I handle token limits in Rails?

Track response.dig("usage", "input_tokens") and output_tokens after each call and log them. For large inputs, truncate or chunk the content before sending. Claude's context window is 200k tokens for Sonnet and Haiku — well above most Rails use cases, but worth monitoring in document-processing pipelines.

Can I use Claude with Rails Action Cable?

Yes. Call the Anthropic API in a Channel action and broadcast chunks as they arrive using ActionCable::Server::Broadcasting. For streaming, run the API call in a background thread or job to avoid blocking the Action Cable connection. Each streamed chunk calls ActionCable.server.broadcast(channel, { text: chunk }).