Beyond /up: Production-Grade Health Checks for Rails and Rack Applications
Most applications start with a simple question:
Is the process alive?
Rails answers that with the built-in /up endpoint.
But production systems need to answer a much more important question:
Is the application actually healthy?
Databases, Redis, queue systems, SMTP servers, external APIs, disk space, and memory pressure all affect an application's ability to serve requests. A process can remain alive even when the application itself is degraded or completely unavailable.
That's the problem I set out to solve with rails_health_checks.
The Problem with Basic Health Endpoints
Simple liveness checks are easy:
get "/up" => proc { [200, {}, ["OK"]] }
But modern applications depend on much more than the web process itself.
Questions that matter in production include:
- Can ActiveRecord reach the database?
- Is Redis available?
- Are background jobs piling up?
- Is an external API responding?
- Is the server running low on disk space?
- Is memory usage approaching dangerous levels?
Teams often end up maintaining custom controllers, bespoke checks, or aging libraries that require significant configuration and don't scale well.
Introducing rails_health_checks
rails_health_checks provides production-grade health endpoints with:
- Built-in checks for common dependencies
- Parallel execution
- Result caching
- Prometheus metrics
- Structured JSON responses
- Authentication options
- Check grouping
- Custom checks
It began as a Rails engine, but now also includes a standalone Rack application that can be mounted into virtually any Rack-based framework.
Rails Integration
In Rails, setup is straightforward:
gem "rails_health_checks"
Mount the engine:
mount RailsHealthChecks::Engine => "/health"
and you're done.
Beyond Rails: Rack Support
One of the newest additions to the project is RailsHealthChecks::Rack::App.
This makes the same endpoints available without requiring Rails routing or ActionDispatch.
That means the gem can now be used with:
- Sinatra
- Roda
- Plain Rack applications
- Internal services
- Lightweight APIs
For example:
# config.ru
require "rails_health_checks"
require "rails_health_checks/rack/app"
the
map "/health" do
run RailsHealthChecks::Rack::App
end
run MyApp
The same endpoints are exposed:
/health/health/live/health/metrics/health/:group
bringing a consistent health-checking experience across different Ruby stacks.
Parallel by Design
One common problem with health systems is latency.
If checks are performed sequentially:
- Database
- Redis
- SMTP
- Sidekiq
- External APIs
Response time equals the sum of all dependencies.
rails_health_checks executes checks in parallel using Concurrent::Future, making the total response time roughly equal to the slowest dependency rather than all of them combined.
Benchmarks show five 10 ms checks completing in roughly 13 ms instead of over 60 ms—a speedup of approximately 4.5×.
Caching to Reduce Load
Monitoring systems often hit health endpoints every few seconds.
Without caching, every request may repeatedly:
- Query the database
- Ping Redis
- Check queue systems
- Contact external services
Enabling caching is as simple as:
RailsHealthChecks.configure do |config|
config.cache_duration = 10
end
This absorbs probe traffic and prevents health checks themselves from becoming a source of load.
Prometheus Metrics Included
The gem also exposes a Prometheus endpoint:
GET /health/metrics
allowing health status and latency to be scraped directly by Prometheus and visualized in Grafana.
No additional adapters or exporters are required.
Framework-Agnostic by Design
Not every check depends on Rails.
Checks such as:
- Disk space
- Memory usage
- HTTP endpoints
- Redis
- SMTP
can run inside any Rack application.
Meanwhile, Rails-specific checks continue to work naturally inside Rails applications.
This allows the same monitoring strategy to be used across a range of services, rather than maintaining separate solutions for each framework.
Why I Built It
I wanted something that:
- Felt native to Rails.
- Worked outside Rails when needed.
- Scaled under heavy probe traffic.
- Supported modern queue systems.
- Produced structured responses and metrics.
- Was easy to extend.
- Didn't require every team to reinvent health checks.
The result is a library that aims to answer not just:
"Is the process running?"
but the more useful question:
"Is the application healthy?"
Because in production, those are rarely the same thing.