Originally published at claudeguide.io/claude-api-ruby-rails-guide
Claude API Ruby on Rails: Complete Guide (2026)
You can call the Claude API from Ruby on Rails using the official anthropic gem — add it to your Gemfile, set ANTHROPIC_API_KEY, and get a response in under 10 lines of code. The gem supports message creation, streaming via Server-Sent Events, tool use, and prompt caching. This guide covers gem installation, service objects, ActionController::Live streaming, ActiveJob background processing, Rails API controller patterns, error handling with retries, and rate limit strategies.
Gem Installation
Add to your Gemfile:
# Gemfile
gem "anthropic"
# Optional: for HTTP-level retry and timeout control
gem "faraday-retry"
Then install:
bundle install
Set your API key as an environment variable. Never hardcode it:
# .env (use dotenv-rails in development)
ANTHROPIC_API_KEY=sk-ant-...
For production on Heroku, Fly.io, or Render, set the env var in the platform dashboard. In Rails credentials:
rails credentials:edit
# Add: anthropic_api_key: sk-ant-...
Reference it in your initializer:
# config/initializers/anthropic.rb
Anthropic.configure do |config|
config.access_token = ENV.fetch("ANTHROPIC_API_KEY") do
Rails.application.credentials.anthropic_api_key
end
end
Basic Message Creation
require "anthropic"
client = Anthropic::Client.new
response = client.messages(
parameters: {
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [
{ role: "user", content: "Explain Rails ActiveRecord callbacks in one paragraph." }
]
}
)
puts response.dig("content", 0, "text")
puts "Tokens used: #{response.dig("usage", "input_tokens")} in / #{response.dig("usage", "output_tokens")} out"
The client reads ANTHROPIC_API_KEY from your environment automatically once configured.
Service Object Pattern
Encapsulate API calls in a service object to keep controllers thin:
# app/services/claude_service.rb
class ClaudeService
MODEL = "claude-sonnet-4-5"
def initialize
@client = Anthropic::Client.new
end
def chat(user_message, system_prompt: nil)
params = {
model: MODEL,
max_tokens: 1024,
messages: [{ role: "user", content: user_message }]
}
params[:system] = system_prompt if system_prompt
response = @client.messages(parameters: params)
response.dig("content", 0, "text")
rescue Anthropic::Error =
---
## Streaming with ActionController::Live
Use Rails `ActionController::Live` and Server-Sent Events to stream tokens to the browser in real time:
ruby
app/controllers/api/v1/stream_controller.rb
class Api::V1::StreamController < ApplicationController
include ActionController::Live
def create
response.headers["Content-Type"] = "text/event-stream"
response.headers["Cache-Control"] = "no-cache"
response.headers["X-Accel-Buffering"] = "no"
client = Anthropic::Client.new
client.messages(
parameters: {
model: "claude-sonnet-4-5",
max_tokens: 2048,
stream: true,
messages: [{ role: "user", content: params[:message] }]
}
) do |chunk, _bytesize|
if chunk["type"] == "content_block_delta"
text = chunk.dig("delta", "text").to_s
response.stream.write("data: #{text.to_json}\n\n") unless text.empty?
end
if chunk["type"] == "message_stop"
response.stream.write("data: [DONE]\n\n")
end
end
rescue ActionController::Live::ClientDisconnected
Rails.logger.info("[StreamController] Client disconnected")
ensure
response.stream.close
end
end
Add the route:
ruby
config/routes.rb
namespace :api do
namespace :v1 do
post "stream", to: "stream#create"
post "chat", to: "chat#create"
end
end
On the frontend, consume the stream with the `EventSource` API or `fetch` with a `ReadableStream`.
**Performance note**: First token arrives in 300–500ms with streaming. Without streaming, a 2,000-token response waits 5–12 seconds for the full payload. For any user-facing feature, always stream.
---
## ActiveJob Background Processing
For long-running or batch tasks, offload Claude calls to a background job:
ruby
app/jobs/claude_analysis_job.rb
class ClaudeAnalysisJob < ApplicationJob
queue_as :default
retry_on Anthropic::RateLimitError, wait: :polynomially_longer, attempts: 5
retry_on Anthropic::ServerError, wait: 10.seconds, attempts: 3
discard_on Anthropic::AuthenticationError
def perform(document_id)
document = Document.find(document_id)
client = Anthropic::Client.new
response = client.messages(
parameters: {
model: "claude-haiku-4-5", # Use Haiku for batch tasks (10x cheaper)
max_tokens: 512,
messages: [
{
role: "user",
content: "Summarize this document in 3 bullet points:\n\n#{document.body}"
}
]
}
)
summary = response.dig("content", 0, "text")
document.update!(summary: summary, summarized_at: Time.current)
end
end
Enqueue it from a controller or callback:
ruby
ClaudeAnalysisJob.perform_later(document.id)
Use Sidekiq or Solid Queue as the backend. For cost optimization, route simple summarization tasks to `claude-haiku-4-5` and reserve Sonnet for complex reasoning. See [Claude Haiku vs Sonnet vs Opus: Which Model](/claude-haiku-sonnet-opus-which-model) for detailed benchmarks.
---
## Error Handling with Retries
The `anthropic` gem raises typed errors you can rescue explicitly:
ruby
app/services/claude_service.rb (robust version)
class ClaudeService
MAX_RETRIES = 3
BASE_DELAY = 1.0 # seconds
def chat_with_retry(message)
attempts = 0
begin
attempts += 1
call_api(message)
rescue Anthropic::RateLimitError =
Frequently Asked Questions
Is there an official Anthropic Ruby gem?
Yes. Anthropic publishes the anthropic gem officially on RubyGems. Add gem "anthropic" to your Gemfile and run bundle install. It supports messages, streaming, tool use, and prompt caching. You can also call the REST API directly with Faraday or Net::HTTP if you prefer zero dependencies.
How do I stream Claude responses in Rails without blocking the main thread?
Use ActionController::Live with include ActionController::Live in your controller and write chunks to response.stream. Puma handles the connection in a separate thread. Ensure your web server is configured for streaming (disable response buffering with X-Accel-Buffering: no for Nginx). Close the stream in an ensure block to prevent connection leaks.
Which model should I use for Rails background jobs?
For high-volume batch tasks (summarization, classification, extraction), use claude-haiku-4-5 — it is roughly 10x cheaper than Sonnet with comparable quality for straightforward tasks. Reserve claude-sonnet-4-5 for complex reasoning or code generation. See Claude Haiku vs Sonnet vs Opus: Which Model for a decision framework.
How do I handle token limits in Rails?
Track response.dig("usage", "input_tokens") and output_tokens after each call and log them. For large inputs, truncate or chunk the content before sending. Claude's context window is 200k tokens for Sonnet and Haiku — well above most Rails use cases, but worth monitoring in document-processing pipelines.
Can I use Claude with Rails Action Cable?
Yes. Call the Anthropic API in a Channel action and broadcast chunks as they arrive using ActionCable::Server::Broadcasting. For streaming, run the API call in a background thread or job to avoid blocking the Action Cable connection. Each streamed chunk calls ActionCable.server.broadcast(channel, { text: chunk }).