One Notification Email Made Our Checkout API 3 Seconds Slower

Rails Performance: Lessons from Production — #5

The earlier posts optimized what happens inside a request — fewer queries, indexes, caching. This one flips the angle: some slowness shouldn't happen in the request at all. Sending email, calling a third party, generating a report — push that slow work to the background and let the request return immediately. Same example throughout (a shipments table).

💥 The checkout was fast, but the user waited 3 seconds

Placing an order is fast in itself — writing one row to the DB takes 10ms. But after the user hits "checkout," the screen froze for 3 seconds before responding.

The controller looked like this:

def create
  order = Order.create!(order_params)         # 10ms
  CourierApi.create_shipment(order)            # call the courier API: 1.5s
  OrderMailer.confirmation(order).deliver_now  # send the confirmation email: 1.2s
  render json: order
end

The order was created long ago, but the user has to sit and wait for the courier API and the email before seeing a result. Worse: the moment the courier API slows down or goes down, the whole checkout request hangs or times out — the user thinks checkout failed, when in fact the order was created.

The root issue: cramming "slow, and not needed right now" work into a request the user is waiting on.

📤 The fix: push it to the background, return immediately

Turn that slow work into a background job — the request only enqueues the task and returns; the actual work runs on a background worker:

def create
  order = Order.create!(order_params)          # 10ms
  CreateShipmentJob.perform_later(order.id)    # enqueue, ~0ms
  OrderMailer.confirmation(order).deliver_later # email goes to the background too
  render json: order                           # return immediately, the user doesn't wait
end

What perform_later does: writes "what to do + arguments" into a queue (Redis or the database, depending on your engine) and returns at once. The request finishes in ~20ms.

A separate worker process pulls the task off the queue and runs it:

class CreateShipmentJob < ApplicationJob
  def perform(order_id)
    order = Order.find(order_id)
    CourierApi.create_shipment(order)   # this 1.5s runs in the background, nobody waits
  end
end

The user's perceived time drops from 3 seconds back to 20ms. If the courier API is slow or down, it only affects that background job (which can retry) — it doesn't drag down checkout.

What belongs in the background: email, third-party APIs, reports/exports, image processing, push notifications — the common thread is "slow, and the user doesn't need to see the result right now."

⚙️ The tools: Active Job + Sidekiq / Solid Queue

Rails' perform_later is Active Job — a unified interface; you can swap the underlying queue engine:

Sidekiq: the mainstream choice, uses Redis as the queue, fast and mature.
Solid Queue: the new Rails 8 default, uses the database as the queue, no separate Redis to run.

# config/application.rb
config.active_job.queue_adapter = :sidekiq

You don't rewrite your job code to switch engines — that's the value of Active Job.

🧨 The traps (the stuff you only hit after you've shipped it)

Pushing work to the background isn't "fire and forget." A few you must understand:

1. Idempotency — a job can run more than once
Background queues are almost always at-least-once: a job may finish successfully but the worker crashes — or gets SIGTERM'd during a deploy — before it reports completion (ack). The queue thinks it didn't finish, re-dispatches it, and it runs again. (Retries are another source: Sidekiq's own Worker API retries 25 times by default; but bare Active Job doesn't retry by default — you add retry_on yourself.) So a job must be designed so that running it twice gives the same result as running it once.

def perform(order_id)
  order = Order.find(order_id)
  return if order.shipment.present?   # catches most duplicates
  CourierApi.create_shipment(order)
end

Note this present? is "check-then-act": under concurrency two jobs can both pass the check and both create a shipment. Real protection needs a DB unique index, or an idempotency key sent to the third-party API so they dedupe. A guard clause only catches the common case — it isn't true idempotency.

2. Pass an id, not the whole object

CreateShipmentJob.perform_later(order)       # works, but not recommended
CreateShipmentJob.perform_later(order.id)    # ✅ pass the id

First, clear up a common misconception: passing the whole object does not stuff all its data into the queue — Active Job uses GlobalID, storing just a gid://app/Order/123 reference and re-finding it at execution time, so you get the latest data anyway (just as fresh as finding it yourself).

So why still pass the id? The real reasons: ① if the object is deleted before the job runs, GlobalID deserialization raises ActiveJob::DeserializationError and the whole job fails — passing an id lets you decide what to do when find misses; ② smaller payload, clearer intent; ③ a plain PORO (not ActiveRecord) has no GlobalID, so you must pass an id anyway.

A useful way to see it: pass an id when you cross your own queue boundary (re-fetched later by the same system). Inside perform, normal in-process calls pass the live object — CourierApi.create_shipment(order) hands over the object because the external API can't look up your internal id; it needs the actual fields to build its request.

3. Use queue priorities
Email can wait, but "payment succeeded" needs to be quick. Split jobs across queues with different priorities:

class CreateShipmentJob < ApplicationJob
  queue_as :critical   # important work on a high-priority queue
end

Otherwise a pile of low-priority report jobs clogs the queue and the urgent ones wait behind them.

4. Cap retries, and handle final failure
Auto-retry is great, but infinite retries on a "courier API permanently down" is just waste. Set a retry limit; past it, move to a dead letter queue or alert a human — don't let it vanish silently. (Terminology varies by engine: Sidekiq calls it the dead set; Solid Queue records failures in the failed_executions table.)

5. Keep jobs small and fast
A job that runs for 30 minutes ties up a worker for 30 minutes, and if it dies halfway it reruns from the start. Break big work up (e.g. find_each batch by batch, one job per batch).

🏁 Wrap-up

in the request (❌)	background job (✅)
user experience	waits for the slow work	returns immediately
third party goes down	whole request hangs / times out	only that job is affected, retryable
web worker	held hostage by slow work	freed up to serve others

One principle:

The request does only what the user needs right now; slow work that can finish later goes to the background.

But the background buys a new class of responsibilities — jobs retry (be idempotent), run later (pass ids, not objects), and fail (cap retries and handle it). "Push it to the background" is easy; "push it correctly" is what makes you senior — the moment you enqueue it, think about what happens when it retries, runs late, or fails.