One Notification Email Made Our Checkout API 3 Seconds Slower

ruby dev.to

Rails Performance: Lessons from Production — #5

The earlier posts optimized what happens inside a request — fewer queries, indexes, caching. This one flips the angle: some slowness shouldn't happen in the request at all. Sending email, calling a third party, generating a report — push that slow work to the background and let the request return immediately. Same example throughout (a shipments table).


💥 The checkout was fast, but the user waited 3 seconds

Placing an order is fast in itself — writing one row to the DB takes 10ms. But after the user hits "checkout," the screen froze for 3 seconds before responding.

The controller looked like this:

def create
  order = Order.create!(order_params)         # 10ms
  CourierApi.create_shipment(order)            # call the courier API: 1.5s
  OrderMailer.confirmation(order).deliver_now  # send the confirmation email: 1.2s
  render json: order
end
Enter fullscreen mode Exit fullscreen mode

The order was created long ago, but the user has to sit and wait for the courier API and the email before seeing a result. Worse: the moment the courier API slows down or goes down, the whole checkout request hangs or times out — the user thinks checkout failed, when in fact the order was created.

The root issue: cramming "slow, and not needed right now" work into a request the user is waiting on.


📤 The fix: push it to the background, return immediately

Turn that slow work into a background job — the request only enqueues the task and returns; the actual work runs on a background worker:

def create
  order = Order.create!(order_params)          # 10ms
  CreateShipmentJob.perform_later(order.id)    # enqueue, ~0ms
  OrderMailer.confirmation(order).deliver_later # email goes to the background too
  render json: order                           # return immediately, the user doesn't wait
end
Enter fullscreen mode Exit fullscreen mode

What perform_later does: writes "what to do + arguments" into a queue (Redis or the database, depending on your engine) and returns at once. The request finishes in ~20ms.

A separate worker process pulls the task off the queue and runs it:

class CreateShipmentJob < ApplicationJob
  def perform(order_id)
    order = Order.find(order_id)
    CourierApi.create_shipment(order)   # this 1.5s runs in the background, nobody waits
  end
end
Enter fullscreen mode Exit fullscreen mode

The user's perceived time drops from 3 seconds back to 20ms. If the courier API is slow or down, it only affects that background job (which can retry) — it doesn't drag down checkout.

What belongs in the background: email, third-party APIs, reports/exports, image processing, push notifications — the common thread is "slow, and the user doesn't need to see the result right now."


⚙️ The tools: Active Job + Sidekiq / Solid Queue

Rails' perform_later is Active Job — a unified interface; you can swap the underlying queue engine:

  • Sidekiq: the mainstream choice, uses Redis as the queue, fast and mature.
  • Solid Queue: the new Rails 8 default, uses the database as the queue, no separate Redis to run.
# config/application.rb
config.active_job.queue_adapter = :sidekiq
Enter fullscreen mode Exit fullscreen mode

You don't rewrite your job code to switch engines — that's the value of Active Job.


🧨 The traps (the stuff you only hit after you've shipped it)

Pushing work to the background isn't "fire and forget." A few you must understand:

1. Idempotency — a job can run more than once
Background queues are almost always at-least-once: a job may finish successfully but the worker crashes — or gets SIGTERM'd during a deploy — before it reports completion (ack). The queue thinks it didn't finish, re-dispatches it, and it runs again. (Retries are another source: Sidekiq's own Worker API retries 25 times by default; but bare Active Job doesn't retry by default — you add retry_on yourself.) So a job must be designed so that running it twice gives the same result as running it once.

def perform(order_id)
  order = Order.find(order_id)
  return if order.shipment.present?   # catches most duplicates
  CourierApi.create_shipment(order)
end
Enter fullscreen mode Exit fullscreen mode

Note this present? is "check-then-act": under concurrency two jobs can both pass the check and both create a shipment. Real protection needs a DB unique index, or an idempotency key sent to the third-party API so they dedupe. A guard clause only catches the common case — it isn't true idempotency.

2. Pass an id, not the whole object

CreateShipmentJob.perform_later(order)       # works, but not recommended
CreateShipmentJob.perform_later(order.id)    # ✅ pass the id
Enter fullscreen mode Exit fullscreen mode

First, clear up a common misconception: passing the whole object does not stuff all its data into the queue — Active Job uses GlobalID, storing just a gid://app/Order/123 reference and re-finding it at execution time, so you get the latest data anyway (just as fresh as finding it yourself).

So why still pass the id? The real reasons: ① if the object is deleted before the job runs, GlobalID deserialization raises ActiveJob::DeserializationError and the whole job fails — passing an id lets you decide what to do when find misses; ② smaller payload, clearer intent; ③ a plain PORO (not ActiveRecord) has no GlobalID, so you must pass an id anyway.

A useful way to see it: pass an id when you cross your own queue boundary (re-fetched later by the same system). Inside perform, normal in-process calls pass the live object — CourierApi.create_shipment(order) hands over the object because the external API can't look up your internal id; it needs the actual fields to build its request.

3. Use queue priorities
Email can wait, but "payment succeeded" needs to be quick. Split jobs across queues with different priorities:

class CreateShipmentJob < ApplicationJob
  queue_as :critical   # important work on a high-priority queue
end
Enter fullscreen mode Exit fullscreen mode

Otherwise a pile of low-priority report jobs clogs the queue and the urgent ones wait behind them.

4. Cap retries, and handle final failure
Auto-retry is great, but infinite retries on a "courier API permanently down" is just waste. Set a retry limit; past it, move to a dead letter queue or alert a human — don't let it vanish silently. (Terminology varies by engine: Sidekiq calls it the dead set; Solid Queue records failures in the failed_executions table.)

5. Keep jobs small and fast
A job that runs for 30 minutes ties up a worker for 30 minutes, and if it dies halfway it reruns from the start. Break big work up (e.g. find_each batch by batch, one job per batch).


🏁 Wrap-up

in the request (❌) background job (✅)
user experience waits for the slow work returns immediately
third party goes down whole request hangs / times out only that job is affected, retryable
web worker held hostage by slow work freed up to serve others

One principle:

The request does only what the user needs right now; slow work that can finish later goes to the background.

But the background buys a new class of responsibilities — jobs retry (be idempotent), run later (pass ids, not objects), and fail (cap retries and handle it). "Push it to the background" is easy; "push it correctly" is what makes you senior — the moment you enqueue it, think about what happens when it retries, runs late, or fails.

Source: dev.to

arrow_back Back to Tutorials