How I Cut Our MTTR in Half Using Rust-Based Dashboards for Incident Debugging

rust dev.to

TL;DR: The dashboard that fails you at 2am is always the one you trusted during business hours. Grafana loads fine when you're doing capacity planning on a Tuesday afternoon — but the moment your error rate spikes and you actually need it, you're sitting there watching a spinner for

📖 Reading time: ~32 min

What's in this article

  1. The Problem: Your Dashboard Lies to You When You Need It Most
  2. What 'Rust Dashboard' Actually Means in This Context
  3. Step 1 — Install and Wire Up Vector as Your Metrics/Log Router
  4. Step 2 — Stand Up Quickwit for Fast Incident Log Search
  5. Step 3 — Build the Incident Dashboard That Actually Helps
  6. Step 4 — Writing a Lightweight Axum Metrics Endpoint (When Prometheus Exporters Don't Cut It)
  7. The 3 Things That Surprised Me About This Stack
  8. When This Stack Is Overkill (Be Honest With Yourself)

The Problem: Your Dashboard Lies to You When You Need It Most

The dashboard that fails you at 2am is always the one you trusted during business hours. Grafana loads fine when you're doing capacity planning on a Tuesday afternoon — but the moment your error rate spikes and you actually need it, you're sitting there watching a spinner for 8 seconds while your on-call Slack channel is already on fire. That 8-second load isn't a coincidence. It's what happens when your visualization layer is querying an overloaded Prometheus during the exact window when Prometheus is also ingesting a surge of incident-related metrics.

High-cardinality label sets are usually the silent killer here. The moment you start tracking per-request spans with labels like request_id, user_id, or trace_id on your Prometheus metrics, you've created a combinatorial explosion. A query like this becomes genuinely dangerous at scale:

# This looks innocent. At 2M active users it will timeout.
sum by (user_id, endpoint, status_code) (
  rate(http_request_duration_seconds_count[5m])
)
Enter fullscreen mode Exit fullscreen mode

Prometheus's TSDB wasn't designed for millions of unique label combinations — it's optimized for fixed-cardinality, scrape-interval data. Once you cross roughly 10 million active series, range queries start timing out. During an incident, that timeout is the difference between a 5-minute MTTR and a 45-minute one. I've watched engineers pull up four separate browser tabs — one for Grafana, one for Jaeger traces, one for raw Loki logs, one for a kubectl dashboard — and spend more time mentally stitching together information across tabs than actually diagnosing the problem. The cognitive overhead during high-stress debugging is real, and fragmented tooling makes it significantly worse.

The reason Rust-based tooling changes this calculation is raw throughput with predictable latency. Vector (from Datadog, written in Rust) can route and transform log and metric streams at rates that would make a Python-based logstash pipeline fall over — I've seen it handle 500K events/sec on a single 4-core machine without the GC pauses that make JVM-based pipelines inconsistent under load. Quickwit is a Rust-native search engine built specifically for log data, and its columnar index structure means full-text search over a billion log lines returns in under a second — not 8 seconds. The third piece is building a thin Axum metric server that sits in front of your Prometheus scrape targets and pre-aggregates high-cardinality data before it ever touches Prometheus's TSDB. Axum compiled in release mode serves JSON metric endpoints with sub-millisecond overhead — it adds essentially nothing to your latency budget while doing real work:

# Cargo.toml for a minimal pre-aggregation metric server
[package]
name = "metric-aggregator"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
prometheus-client = "0.22"
dashmap = "5"   # lock-free concurrent HashMap — critical for high-throughput paths
serde = { version = "1", features = ["derive"] }
Enter fullscreen mode Exit fullscreen mode

The actual win isn't that Rust is fast in a benchmark. The win is that Rust's memory model forces you to think about allocation at design time, so the tail latency profile of your metric server stays flat under load — no surprise GC pauses when your incident is already generating 10x normal event volume. That's when every other runtime tends to degrade in exactly the way you can't afford.

What 'Rust Dashboard' Actually Means in This Context

The phrase "Rust dashboard" is doing a lot of lifting here, and I want to be precise about what it actually means in an incident-speed context — because the first thing people assume is a Yew or Leptos front-end, and that's a completely separate rabbit hole with different trade-offs. What I'm describing is Rust owning the data plane: the layer that moves, transforms, indexes, and exposes your operational data. The rendering layer is still Grafana. That's fine. Grafana is good at rendering. The problem was never the front-end.

The stack we landed on after a few months of iteration looks like this: Vector handles log and metric ingestion and routing, Quickwit handles log indexing and search, and Grafana renders it all through the Quickwit datasource plugin plus standard Prometheus endpoints. Every component in that pipeline except Grafana is written in Rust, and that's not an aesthetic choice — it's about what happens to your observability pipeline during an incident, when the system is under load and you most need your tooling to stay stable.

# vector.toml — minimal incident-relevant snippet
[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]
read_from = "beginning"

[transforms.parse_json]
type = "remap"
inputs = ["app_logs"]
source = '''
. = parse_json!(.message)
.timestamp = parse_timestamp!(.timestamp, format: "%+")
'''

[sinks.quickwit]
type = "http"
inputs = ["parse_json"]
uri = "http://quickwit-host:7280/api/v1/my-index/ingest"
encoding.codec = "json"
# batch to reduce ingest overhead without losing events
batch.max_bytes = 10485760
batch.timeout_secs = 1
Enter fullscreen mode Exit fullscreen mode

I switched from Logstash to Vector because Logstash was eating ~600MB RSS at idle on a modest log volume, and the JVM GC would occasionally pause for 200-400ms — which is comically bad timing when you're mid-incident and trying to figure out what just happened. Vector on the same workload sits around 60-80MB RSS and the memory footprint stays flat under burst load. That's not a benchmark I cherry-picked; that's what ps aux showed me at 2am when the GC pause was the reason my log sink fell behind by 45 seconds. The operational cost of the JVM in a data pipeline component is real, and Rust just eliminates it.

The honorable mention here is Axum for lightweight internal metrics endpoints. If you have a service that doesn't natively expose Prometheus metrics — maybe it's a legacy binary or an internal tool — wiring up a small Axum service to scrape its state and re-expose it as /metrics takes maybe 80 lines of Rust and uses almost no resources. The key advantage over a Python or Go shim is that you can bundle it as a single static binary and drop it anywhere without runtime dependencies.

// Minimal Axum Prometheus endpoint — no framework overhead
use axum::{routing::get, Router};
use std::net::SocketAddr;

async fn metrics_handler() -> String {
    // pull from your internal state, format as prometheus text exposition
    let some_gauge: f64 = get_internal_queue_depth().await;
    format!(
        "# HELP internal_queue_depth Current depth of processing queue\n\
         # TYPE internal_queue_depth gauge\n\
         internal_queue_depth {}\n",
        some_gauge
    )
}

#[tokio::main]
async fn main() {
    let app = Router::new().route("/metrics", get(metrics_handler));
    let addr = SocketAddr::from(([0, 0, 0, 0], 9100));
    axum::Server::bind(&addr).serve(app.into_make_service()).await.unwrap();
}
Enter fullscreen mode Exit fullscreen mode

Quickwit deserves a specific callout because it's often overlooked in favor of Elasticsearch. The reason we use it for incident dashboards specifically is its query latency on cold data — Elasticsearch requires warm caches to stay fast, but Quickwit is built around a columnar storage format on object storage (S3 or local disk) and can answer a full-text search across 24 hours of dense logs in under 2 seconds without pre-warming anything. During an incident that started 6 hours ago, you don't have the luxury of waiting for index caches to fill. The trade-off is that Quickwit's aggregation queries are less flexible than Elasticsearch's, so if your dashboards are heavy on complex metric math, you'll feel that constraint.

Step 1 — Install and Wire Up Vector as Your Metrics/Log Router

The thing that surprised me most about Vector versus Fluentd or Logstash is that it's a single statically-linked binary. No JVM, no Ruby runtime, no "wait, which gem version?" — you install it and it runs. I switched our team's log router to Vector because during a 2am incident, the last thing you want is your observability pipeline itself OOMing because Logstash decided to eat 4GB of heap.

Run the install exactly like this — don't pull it from your distro's package manager unless you want a version from 18 months ago:

# Install Vector — this drops a binary at /usr/local/bin/vector
curl --proto '=https' --tlsv1.2 -sSfL https://sh.vector.dev | bash

# Immediately confirm what you got — I've been burned by PATH shadowing before
vector --version
# Expected: vector 0.39.0 (x86_64-unknown-linux-gnu)

# Check it can read config without actually starting
vector validate --config /etc/vector/vector.toml
Enter fullscreen mode Exit fullscreen mode

The version matters here. VRL (Vector Remap Language) had breaking changes between 0.34 and 0.36 around the parse_json return type. If you're on anything below 0.37, your remap blocks might silently coerce values in ways that corrupt your numeric incident timestamps. Lock your version in CI and deploy explicitly.

Here's a real vector.toml I use for an incident dashboard pipeline — tailing app logs, parsing JSON, fanning out to both Prometheus remote_write and Quickwit for full-text search. Every block is annotated for what it does when you're replaying a past incident:

[sources.app_logs]
type = "file"
# Watch the active log — Vector handles rotation automatically
include = ["/var/log/myapp/*.log"]
read_from = "beginning"  # use "end" in prod, "beginning" for incident replay

[transforms.parse_json_events]
type = "remap"
inputs = ["app_logs"]
source = '''
  # Parse the raw string into a structured event
  . = parse_json!(string!(.message))

  # Normalize timestamp to RFC3339 — Quickwit rejects epoch millis without this
  .timestamp = to_timestamp!(.ts, unit: "milliseconds")

  # Tag severity so Prometheus can filter on it as a label
  .level = downcase(string!(.level))
'''

[transforms.extract_metrics]
type = "log_to_metric"
inputs = ["parse_json_events"]

  [[transforms.extract_metrics.metrics]]
  type = "counter"
  field = "request_id"
  name = "http_requests_total"
  # Incident replay: this counter lets you see exact request volume at failure time
  [transforms.extract_metrics.metrics.tags]
    status = "{{status_code}}"
    service = "{{service_name}}"

[sinks.prometheus_remote_write]
type = "prometheus_remote_write"
inputs = ["extract_metrics"]
endpoint = "http://localhost:9090/api/v1/write"
# Auth if your Prometheus sits behind a proxy
# auth.strategy = "bearer"
# auth.token = "${PROM_TOKEN}"

[sinks.quickwit]
type = "http"
inputs = ["parse_json_events"]
uri = "http://localhost:7280/api/v1/my-index/ingest"
method = "post"
encoding.codec = "json"
# Quickwit wants newline-delimited JSON — this is the gotcha nobody documents
framing.method = "newline_delimited"

  [sinks.quickwit.headers]
  content-type = "application/x-ndjson"
Enter fullscreen mode Exit fullscreen mode

Here's the Logstash migration gotcha I wish someone had told me: VRL is not grok. Not even close. If you have Logstash patterns like %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level}, you cannot paste those into a VRL remap block. VRL uses its own parse_grok function but the named patterns are different and the behavior on partial matches is opposite — Logstash continues on partial match, VRL's parse_grok! will halt the transform and drop the event. I spent a full afternoon debugging dropped events because of this. Rewrite your patterns from scratch using parse_json for structured logs or parse_regex for unstructured ones. It's faster than porting and you'll end up with cleaner transforms anyway.

Before you touch production, use vector tap to watch events flowing through your pipeline in real time. This is genuinely the best local debugging tool Vector ships:

# In terminal 1 — start Vector with your config
vector --config /etc/vector/vector.toml

# In terminal 2 — tap the output of a specific transform to see what's coming through
vector tap --inputs-of quickwit

# You'll see live JSON events like:
# {"timestamp":"2024-03-15T02:31:44Z","level":"error","service":"auth","status_code":500,...}

# To tap a specific component and filter by field value (0.38+):
vector tap --inputs-of prometheus_remote_write --format json \
  | jq 'select(.level == "error")'
Enter fullscreen mode Exit fullscreen mode

vector tap reads from your running pipeline without modifying it — no side effects, no dropped events. I run this during incident post-mortems by pointing Vector at archived log files with read_from = "beginning", tapping the output, and piping into jq to reconstruct the exact sequence of errors that cascaded. It's dramatically faster than grepping through compressed logs, especially once you've got the JSON parsing working correctly and your Quickwit index has the right field mappings pre-configured.

Step 2 — Stand Up Quickwit for Fast Incident Log Search

The honest reason I stopped reaching for Elasticsearch in incident dashboards isn't performance — it's the operational tax. ES wants heap tuning, shard management, warm/cold tier configs, and a dedicated person who speaks Lucene fluently. Quickwit's index-on-write model sidesteps most of that. Logs land, they're indexed immediately on object storage (S3, GCS, or local disk), and you can run a query against them within 10 seconds of ingestion — cold storage included. That last part is the thing that caught me off guard: there's no "move to cold tier and accept the query latency penalty" trade-off. Quickwit just... works on top of whatever you point it at.

Getting it running locally is genuinely fast. Create a quickwit.yaml config and spin it up:

# quickwit.yaml
version: 0.7
metastore_uri: ./qwdata/metastore
default_index_root_uri: ./qwdata/indexes

searcher:
  rest_listen_port: 7280

indexer:
  rest_listen_port: 7281
Enter fullscreen mode Exit fullscreen mode
# Start Quickwit with Docker Compose
version: "3.8"
services:
  quickwit:
    image: quickwit/quickwit:0.7.4
    ports:
      - "7280:7280"
    volumes:
      - ./quickwit.yaml:/quickwit/config/quickwit.yaml
      - ./qwdata:/quickwit/qwdata
    command: run --config /quickwit/config/quickwit.yaml
Enter fullscreen mode Exit fullscreen mode
docker compose up -d
# Check it's alive
curl http://localhost:7280/api/v1/version
Enter fullscreen mode Exit fullscreen mode

The index doc mapping is where most people get burned the first time. If you define timestamp as a plain text field instead of datetime, Quickwit ingests it fine but sorting and time-range filtering silently break — you'll get unordered results and wonder why your incident timeline looks scrambled. Define it explicitly:

curl-XPOSThttp://localhost:7280/api/v1/indexes\-H"Content-Type: application/json"\-d'{"version":"0.7","index_id":"devops-incidents","doc_mapping":{"field_mappings":[{"name":"timestamp","type":"datetime","input_formats":["rfc3339","unix_timestamp"],"fast":true},{"name":"service","type":"text","tokenizer":"raw"},{"name":"severity","type":"text","tokenizer":"raw"},{"name":"message","type":"text"},{"name":"latency_ms","type":"u64","fast":true}],"timestamp_field":"timestamp"},"search_settings":{"default_search_fields":["message","service"]}}'
Enter fullscreen mode Exit fullscreen mode

The "fast": true flag on timestamp and numeric fields is not optional if you care about dashboard speed — it pre-computes columnar storage for those fields so range queries don't have to scan the full inverted index. Skip it and your incident timeline queries will crawl once you're past a few million events.

Connecting to Grafana requires the community plugin, not the core datasource list. Install it and restart:

# Inside your Grafana container or host
grafana-cli plugins install quickwit-quickwit-datasource

# Then restart Grafana
systemctl restart grafana-server
# or if Docker:
docker restart grafana
Enter fullscreen mode Exit fullscreen mode

After that, add a new datasource in Grafana UI pointing to http://localhost:7280, set the index to devops-incidents, and configure the timestamp field as timestamp. The datasource will validate the connection and pull field names automatically.

Here's the rough edge nobody documents upfront: as of Quickwit's Grafana plugin version 0.7.x, Grafana dashboard variables — the $service or $severity dropdowns you'd normally wire into queries — don't interpolate cleanly. The plugin doesn't fully resolve template variables in query mode. My workaround is switching affected panels to raw query mode and building the filter string manually with string concatenation in the panel's query editor:

// Raw query mode — manually inject the variable value
// This is ugly but it works until the plugin matures
severity:ERROR AND service:payment-service AND timestamp:[now-1h TO now]
Enter fullscreen mode Exit fullscreen mode

You can wrap this in a Grafana "Text" variable set to severity:${severity_var} and reference it in the raw query string — messy, but functional. I've opened a tab on the Quickwit GitHub issues to track when native variable support lands. Until then, raw query mode is the pragmatic call, not a workaround you should feel bad about.

Step 3 — Build the Incident Dashboard That Actually Helps

The layout decision matters more than any individual panel. I've burned too much time on dashboards where the error rate is top-right, the logs are on page 2, and you're alt-tabbing during an active incident. What actually works: error rate timeseries top-left (first thing your eye lands on), the live log stream bottom-left (context immediately below the graph that triggered your interest), and the p99 latency heatmap spanning the right column. Everything fits on a 1440p monitor without scrolling. The moment you need to scroll during an incident, your dashboard has already failed you.

Provision the whole thing as JSON through /etc/grafana/provisioning/dashboards/ — not through the UI. Once it's in a file, it lives in git, it deploys with your infra, and you stop having "who deleted the latency panel at 2am" conversations. The directory structure Grafana expects:

# /etc/grafana/provisioning/dashboards/dashboards.yaml
apiVersion: 1
providers:
  - name: 'incident-dashboards'
    orgId: 1
    folder: 'IncidentResponse'
    type: file
    disableDeletion: true        # nobody fat-fingers this in prod
    updateIntervalSeconds: 30    # picks up changes without restart
    options:
      path: /var/lib/grafana/dashboards
Enter fullscreen mode Exit fullscreen mode

Then your dashboard JSON lives at /var/lib/grafana/dashboards/incident.json, committed alongside your Terraform or Ansible. The disableDeletion: true flag is worth keeping — provisioned dashboards can still be edited in the UI, but the next sync cycle restores them. That's the behavior you want for on-call dashboards that people "temporarily adjust" and then forget.

The $__interval variable is where most Prometheus dashboards silently lie to you. The phantom spike problem: your query uses a hardcoded rate(http_requests_total[1m]), someone zooms to a 72-hour window, and suddenly the 1-minute bucket makes every data point look like a spike relative to its neighbors because you're under-sampling. The fix is letting Grafana set the range dynamically:

# Wrong — hardcoded range causes phantom spikes on wide time windows
rate(http_requests_total{job="api-server"}[1m])

# Right — $__interval adapts to the current zoom level
rate(http_requests_total{job="api-server"}[$__interval])

# For heatmaps specifically, use $__rate_interval
# which has a minimum of 4x the scrape interval
rate(http_request_duration_seconds_bucket[$__rate_interval])
Enter fullscreen mode Exit fullscreen mode

The $__rate_interval variant is the one Grafana docs bury in a footnote: it guarantees at least 4× your scrape interval, which prevents the "no data" holes you get when $__interval collapses below your 15s or 30s scrape frequency. Use $__interval for counters and gauges over normal zoom ranges; use $__rate_interval for any rate calculation that touches histogram buckets.

Data links are what separate a debugging dashboard from a reporting dashboard. On your latency heatmap panel, add a data link that passes the clicked time range into the Quickwit log panel's query. In Grafana's panel JSON under fieldConfig.defaults.links:

{"title":"View logs for this window","url":"/d/incident-main/incident-response?orgId=1&from=${__value.time}&to=${__value.timeEnd}&var-service=${__series.name}","targetBlank":false}
Enter fullscreen mode Exit fullscreen mode

The ${__value.time} and ${__value.timeEnd} interpolations grab the cell's time bounds from the heatmap click — not just the cursor position. That distinction matters. You click a red cell on the heatmap, Grafana re-renders the entire dashboard scoped to that 30-second or 1-minute bucket, and the log stream below immediately shows only entries from that window. No manual time range adjustment, no copy-pasting epoch timestamps. The var-service passthrough is what makes it actually scoped — otherwise you're looking at all services' logs when you only care about the one that spiked.

The annotation overlay is the last piece, and it answers the question every post-mortem starts with: "Did we page on-call before or after the latency degraded?" Wire Alertmanager annotations directly into Grafana by adding it as a data source and then configuring an annotation query on your dashboard JSON:

{"annotations":{"list":[{"builtIn":1,"datasource":"-- Grafana --","enable":true,"hide":true,"iconColor":"rgba(0, 211, 255, 1)","name":"Annotations & Alerts","type":"dashboard"},{"datasource":"Alertmanager","enable":true,"expr":"ALERTS{severity=~\"critical|warning\"}","iconColor":"#FF0000","name":"PagerDuty Fires","step":"60s","titleFormat":"{{alertname}} — {{severity}}"}]}}
Enter fullscreen mode Exit fullscreen mode

What you end up with is a vertical red line on every panel simultaneously the moment an alert fired. During a post-mortem you can literally point at the gap: "error rate started climbing at 14:32, annotation shows the page went out at 14:38 — six minutes of silent degradation before we knew." That gap is your MTTD, and seeing it visually on the same screen as the signals that should have caught it earlier is what actually drives alert threshold improvements. You can also pull PagerDuty directly if you install the PagerDuty Grafana plugin — it exposes incidents as annotations with incident severity, which gives you richer labeling than Alertmanager alone.

Step 4 — Writing a Lightweight Axum Metrics Endpoint (When Prometheus Exporters Don't Cut It)

The thing that caught me off guard the first time I wired up Prometheus scraping for a Rust service was how useless the default exporters were for anything business-specific. Node exporter gives you CPU and memory. The Axum metrics middleware gives you HTTP latency. Neither tells you that your per-tenant cache hit ratio dropped below 60% or that your job queue for tenant X has 4,000 unprocessed items. That's the gap you have to close yourself, and it's actually not much code.

Here's the minimal setup I use. This is axum 0.7, metrics 0.23, and metrics-exporter-prometheus 0.15 — the versions matter because the API changed significantly between 0.21 and 0.23:

// Cargo.toml
[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
metrics = "0.23"
metrics-exporter-prometheus = "0.15"

// main.rs
use axum::{routing::get, Router};
use metrics_exporter_prometheus::{Matcher, PrometheusBuilder, PrometheusHandle};

#[tokio::main]
async fn main() {
    // Install the recorder globally — do this once, before any metrics calls
    let handle: PrometheusHandle = PrometheusBuilder::new()
        .set_buckets_for_metric(
            Matcher::Full("queue_processing_duration_seconds".to_string()),
            // Match your SLO thresholds, not defaults.
            // Default is [0.005, 0.01, 0.025, ...5, 10] — wrong for a job queue
            &[0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0],
        )
        .unwrap()
        .install_recorder()
        .unwrap();

    let app = Router::new()
        .route("/process", axum::routing::post(process_job))
        .route("/metrics", get(move || async move { handle.render() }));

    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}
Enter fullscreen mode Exit fullscreen mode

Then inside your actual handler, recording metrics is three lines using the metrics facade macros:

use metrics::{counter, histogram};

async fn process_job(
    axum::extract::Json(payload): axum::extract::Json,
) -> axum::http::StatusCode {
    let start = std::time::Instant::now();

    let result = do_work(&payload).await;

    // Label by tenant — this is what gives you per-tenant breakdowns in Grafana
    let tenant = &payload.tenant_id;
    histogram!("queue_processing_duration_seconds", "tenant" => tenant.clone())
        .record(start.elapsed().as_secs_f64());

    match result {
        Ok(_) => {
            counter!("jobs_processed_total", "tenant" => tenant.clone(), "status" => "ok")
                .increment(1);
            axum::http::StatusCode::OK
        }
        Err(_) => {
            counter!("jobs_processed_total", "tenant" => tenant.clone(), "status" => "error")
                .increment(1);
            axum::http::StatusCode::INTERNAL_SERVER_ERROR
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

I use the metrics crate facade instead of calling the prometheus crate directly, and the reason is entirely about testability. With the facade, you can swap in a no-op recorder or a test recorder in your unit tests without refactoring your handler logic. The prometheus crate's global registry is a singleton — once you register a metric, you can't unregister it cleanly in tests, and you'll hit "duplicate metric" panics when test cases run in the same process. With metrics, you call metrics::set_recorder() at the top of each test with an isolated recorder. Your handlers don't know or care what backend is receiving the data.

The bucket gotcha is the one that will burn you silently if you skip it. Prometheus histogram buckets default to [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0], which was designed for HTTP request latency measured in seconds. If you're measuring a job queue where your P99 SLO is 30 seconds, almost every observation will land in the +Inf bucket and your histogram becomes a flat line. Worse, if you're measuring cache lookup time in microseconds, everything falls below the first bucket and you lose all resolution. The fix is set_buckets_for_metric with a Matcher::Full or Matcher::Prefix pointing at your metric name. I also use Matcher::Prefix("cache_") to apply one bucket config to all cache-related metrics at once. Do this before the first metric is recorded — you can't change buckets after the metric is registered.

  • Cache hit ratio: Track cache_hits_total and cache_misses_total as counters, compute the ratio in your Grafana query with rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) — don't try to expose a gauge ratio directly, it creates race conditions.
  • Queue depth: Use a gauge! macro updated after each enqueue/dequeue, not a counter — queue depth goes down as well as up.
  • Per-tenant cardinality: If you have more than ~50 tenants, profile the memory cost before shipping. High-cardinality labels with histograms (each tenant × each bucket) can eat 50MB+ in the Prometheus recorder's internal state.

The 3 Things That Surprised Me About This Stack

The one that genuinely shocked me was Vector's disk buffer behavior during a Quickwit restart. I expected log loss. We had a routine restart of our Quickwit indexer — nothing dramatic, maybe 90 seconds of downtime — and I was mentally prepared to explain a gap in the incident timeline to the team. Nothing happened. Vector had quietly queued everything to disk and replayed it once the sink came back up. No data loss, no alert, no config change on my end. I hadn't touched buffer.type at all — it defaults to memory, but Vector had been configured in our base template with type = "disk" by whoever set it up before me, and I'd never noticed.

[sinks.quickwit_logs]
  type = "http"
  inputs = ["parsed_logs"]
  uri = "http://quickwit:7280/api/v1/my-index/ingest"

  [sinks.quickwit_logs.buffer]
    type = "disk"
    max_size = 268435488  # 256MB — adjust based on your peak ingest rate
    when_full = "block"   # block the pipeline rather than drop events
Enter fullscreen mode Exit fullscreen mode

The when_full = "block" setting is what makes this safe. The alternative is "drop_newest", which is a silent foot-gun during incidents when your ingest spike is exactly when you need every log line. With block, backpressure propagates upstream and you'll see it in your Vector internal metrics (buffer_events climbing) before you ever lose a byte. Set max_size based on your peak log rate × your worst-case sink recovery time, not just a round number.

Quickwit's search latency over 30 days of dense logs is fast — I'm talking sub-second on filtered queries — but only if your index mapping was right from day one. I learned this the hard way. We initially indexed everything under a catch-all dynamic field and then tried to add a proper timestamp_field and structured fields for severity, service_name, and trace_id after the fact. Quickwit doesn't let you alter an existing index's mapping. You have to create a new index, backfill, and cut over. The backfill across 30 days of production logs took around 4 hours and required a custom script to re-ingest from S3. This isn't a flaw exactly — it's a column-oriented storage tradeoff — but it means your first mapping design decision is load-bearing.

# quickwit.yaml — get this right before you ingest anything real
version: 0.7
index_id: devops-incidents
doc_mapping:
  mode: strict           # reject unknown fields — forces discipline upfront
  timestamp_field: timestamp
  field_mappings:
    - name: timestamp
      type: datetime
      input_formats: [rfc3339]
      fast: true         # enables time-range pruning — critical for 30-day queries
    - name: severity
      type: text
      tokenizer: raw     # exact match only, no tokenization — faster for enum-like fields
      fast: true
    - name: service_name
      type: text
      tokenizer: raw
      fast: true
    - name: trace_id
      type: text
      tokenizer: raw
    - name: message
      type: text
      tokenizer: default
Enter fullscreen mode Exit fullscreen mode

The Rust compile times for the Axum metrics service are a genuine CI tax, and I don't mean a minor annoyance — I mean 8–12 minute cold builds on a 4-core GitHub Actions runner for a moderately sized codebase. The fix is cargo-chef, which splits your Docker build into a dependency layer that only rebuilds when Cargo.lock changes, and an application layer that rebuilds when your source changes. Once we wired this in, incremental builds dropped to under 2 minutes. The setup is slightly fiddly because you need a multi-stage Dockerfile with three stages, but it's a one-time cost.

# Stage 1: compute the dependency recipe
FROMrust:1.78-slimASchef
RUN cargo install cargo-chef
WORKDIR /app

FROMchefASplanner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json

# Stage 2: build deps only — this layer gets cached by Docker
FROMchefASbuilder
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
RUN cargo build --release --bin metrics-service

# Stage 3: minimal runtime image
FROMdebian:bookworm-slimASruntime
COPY --from=builder /app/target/release/metrics-service /usr/local/bin/
CMD ["metrics-service"]
Enter fullscreen mode Exit fullscreen mode

One thing that tripped us up: if you're using sqlx with compile-time query checking, you need the SQLX_OFFLINE=true flag during the chef cook stage, otherwise it tries to connect to a database that doesn't exist during the build. Run cargo sqlx prepare locally first to generate the .sqlx query cache, commit it, and then the Docker build can use the offline mode. Missing this detail will give you a frustratingly opaque build error about database connections in what looks like a pure compilation step.

When This Stack Is Overkill (Be Honest With Yourself)

I'll be straight with you: I built this entire stack for a project that ended up not needing it, and I watched my team spend two weeks tuning Vector pipelines that a free Grafana Cloud account would have handled out of the box. That pain is why this section exists.

If you're running a single service under ~50 requests per second, stop reading and go sign up for Grafana Cloud's free tier right now. You get 50GB of logs per month via Loki, 10,000 series for metrics, and 14-day retention — all for zero dollars. The custom Axum exporter, the self-hosted Quickwit instance, the Vector aggregation layer — that's probably 8–12 hours of initial setup and then ongoing maintenance forever. A Grafana Cloud account is 20 minutes and a YAML file. The math is not close.

The Axum exporter piece specifically becomes a team liability if nobody on your team can actually read a Rust compile error. I don't mean "can write Rust" — I mean can look at a lifetime error in a tokio::spawn closure and not immediately panic. If your team is Node.js or Python-native and you're adding a custom Rust binary to the critical observability path, you've just created a component that nobody will touch when it breaks at 2am. Use the Node Exporter or the OpenTelemetry Collector's Prometheus receiver instead. They're boring, they're stable, and your whole team can debug them.

If you're already paying for Datadog or Honeycomb, the ROI of self-hosting Vector plus Quickwit is almost certainly negative — unless your Datadog bill has become genuinely alarming. Honeycomb's trace query latency is hard to beat, and Datadog's alerting pipeline is more mature than anything you'll build yourself. The only time I'd recommend ripping that out is if you're hitting Datadog's per-host pricing at scale and you've done the math: Quickwit on a $40/month VPS versus $23/host/month for 20+ hosts is where self-hosting starts making actual economic sense.

The real sweet spot for this stack is specific and narrow:

  • Three or more services with heterogeneous log formats — that's where Vector's remap language earns its complexity
  • Mixed signals: you need full-text log search AND time-series metrics in the same incident workflow, not one or the other
  • Cost pressure is real: you've looked at your observability bill and it's approaching your compute bill
  • At least one person on the team can own the Rust layer — even part-time. They don't have to be an expert, they just need to not be scared of cargo build --release and reading axum middleware docs

If three of those four conditions are true, this stack pays off. If it's one or two, you're buying operational complexity you don't need. The fastest incident response comes from tools your whole team understands, not the architecturally correct solution that only one person can touch.

Quick Reference: What Each Tool Does in the Incident Workflow

The thing that surprises most teams when they first wire this stack together is how the bottleneck shifts. You expect the database query or the network hop to be the slow part — instead you find Grafana waiting on a Python-based log aggregator having a GC event right when your dashboards need to refresh during an active incident. Replacing that aggregator with Rust tooling doesn't just make things faster; it makes the latency predictable, which is what actually matters when you're triaging at 2am.

Here's how each tool maps to a specific job in the incident workflow, why Rust is or isn't the relevant factor, and where each one will bite you:

Tool

Role in Incident Workflow

Why Rust Matters Here

Main Gotcha

Vector

Log/metric ingestion and routing

No GC pauses when volume spikes 10x mid-incident

VRL transform errors fail silently unless you add explicit abort conditions

Quickwit

Log storage and full-text search

Columnar Rust internals give sub-second search even on cold, unindexed data

Index schema changes require a full reindex — plan your fields before ingestion starts

Axum metrics endpoint

Exposing custom business metrics

Same binary as your app, zero sidecar tax, microsecond response on /metrics

You own the cardinality problem — nothing stops you from creating label explosions

Grafana

Dashboard rendering and alerting

Not written in Rust — but with fast backends feeding it, Grafana stops being the bottleneck

Alert evaluation runs on Grafana's schedule, not your ingestion rate — factor in the lag

Vector is the part I'd defend most aggressively. The GC pause issue with Logstash or Fluentd isn't theoretical — I've watched a 10x log spike turn a 40ms Logstash pipeline into a 4-second one, which means you're debugging an incident with data that's already 4 seconds stale and getting worse. Vector written in Rust keeps p99 ingestion latency flat even under that load. The config is also explicit in a way I like — you declare your sources, transforms, and sinks, and the topology is readable without tribal knowledge:

[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]

[transforms.parse_json]
type = "remap"
inputs = ["app_logs"]
source = '''
  . = parse_json!(.message)
  # abort if required field missing — don't let bad events through silently
  .request_id = string!(.request_id)
'''

[sinks.quickwit]
type = "http"
inputs = ["parse_json"]
uri = "http://quickwit:7280/api/v1/my-index/ingest"
encoding.codec = "json"
Enter fullscreen mode Exit fullscreen mode

Quickwit is where the architecture gets genuinely interesting. Unlike Elasticsearch — which needs warm replicas and heap tuning to avoid query latency on cold data — Quickwit's columnar storage (built on Tantivy, also Rust) can search months-old log data in under a second when you're tracing the root cause of a recurring incident. The gotcha is real though: the schema is set at index creation time. I made the mistake of ingesting logs before I'd decided whether status_code would be a u64 or a text field, and had to nuke the index and reprocess. Define your schema first, ingest second.

The Axum metrics endpoint is often the piece teams underestimate. Instead of running a Prometheus exporter as a separate process with its own memory footprint, you embed the endpoint directly into your Rust service. Here's the core of what that looks like with the prometheus crate:

use axum::{routing::get, Router};
use prometheus::{register_counter_vec, Encoder, TextEncoder};

async fn metrics_handler() -> String {
    let encoder = TextEncoder::new();
    let metric_families = prometheus::gather();
    let mut buffer = Vec::new();
    // encode to text exposition format Grafana/Prometheus expects
    encoder.encode(&metric_families, &mut buffer).unwrap();
    String::from_utf8(buffer).unwrap()
}

pub fn metrics_router() -> Router {
    Router::new().route("/metrics", get(metrics_handler))
}
Enter fullscreen mode Exit fullscreen mode

That's your entire sidecar eliminated. For a broader view of Rust-adjacent developer tooling — including AI assistants that actually understand Rust's borrow checker well enough to be useful — the Best AI Coding Tools in 2026 (thorough Guide) is worth scanning before you set up your dev environment for this stack.

FAQ

Does Vector replace Prometheus entirely?

Short answer: no, and I wouldn't try. Vector is a data pipeline — it collects, transforms, and routes observability data. Prometheus is a metrics store with a query engine and a pull-based scraping model. I use Vector to ship logs and process events, then still point Prometheus at my Rust services for /metrics scraping. They coexist fine. Where Vector does reduce Prometheus dependency is when you're aggregating pre-computed metrics from multiple sources and forwarding them to a remote store like VictoriaMetrics or Grafana Mimir instead — in that case you might skip some Prometheus instances at the edge. But replacing the entire Prometheus ecosystem, including PromQL and alerting rules? That's a different project entirely.

Is Quickwit production-ready or still experimental?

Quickwit hit 1.0 in 2024. I've run it under real incident load — ingesting structured JSON logs from Vector at around 50K events/sec — and it held up. The thing that caught me off guard was the cold search latency on object storage-backed indexes. If your S3 index hasn't been queried recently, first-query latency can spike to 3-5 seconds while it fetches split metadata. Warm queries are fast — sub-200ms for most log searches across a 24-hour window. For incident dashboards where you're doing repeated queries on a live incident, that warmup penalty disappears quickly. The REST API is stable, the Jaeger-compatible tracing endpoint works, and the data ingestion API hasn't had a breaking change since 0.6. I wouldn't use it for a compliance-grade audit log system yet, but for ops dashboards and incident triage? Solid.

Can I use this stack with Kubernetes, or is it bare-metal only?

Works great on Kubernetes. The recommended pattern is to run Vector as a DaemonSet so every node ships its own container logs, plus a separate Vector Aggregator deployment that handles routing to Quickwit. Here's a minimal values snippet for the official Helm chart:

# values.yaml for vector (helm.vector.dev)
role: "Agent"  # DaemonSet mode

customConfig:
  sources:
    kubernetes_logs:
      type: kubernetes_logs

  transforms:
    parse_json:
      type: remap
      inputs: ["kubernetes_logs"]
      source: |
        # Only parse structured logs; drop parse errors silently
        .message = parse_json(.message) ?? .message

  sinks:
    quickwit:
      type: http
      inputs: ["parse_json"]
      uri: "http://quickwit-searcher.observability.svc.cluster.local:7280/api/v1/ops-logs/ingest"
      encoding:
        codec: json
Enter fullscreen mode Exit fullscreen mode

Quickwit itself runs well on Kubernetes with persistent volumes for its index cache, and object storage (S3, GCS, Azure Blob) for the actual index data. The metastore can use PostgreSQL 15+ as a backend, which I prefer over the default file-based approach once you're running more than one Quickwit indexer node. The one bare-metal advantage is latency — no overlay network tax — but for most incident debugging workflows, the difference doesn't matter.

How do I handle secrets in vector.toml without committing them?

Vector has native secret backend support as of v0.34. You can reference environment variables directly in your config using the ${ENV_VAR} syntax, or use the secret backend for external stores. Here's the pattern I actually use in production:

# vector.toml
[secret.aws]
type = "aws_secrets_manager"
region = "us-east-1"

[sinks.quickwit_auth]
type = "http"
# Pulls from AWS Secrets Manager at startup, not hardcoded
auth.strategy = "bearer"
auth.token = "SECRET[aws.quickwit_ingest_token]"
Enter fullscreen mode Exit fullscreen mode

If you're not on AWS, the env var approach is the pragmatic fallback — inject secrets via Kubernetes Secrets mounted as environment variables, and reference them as ${QUICKWIT_TOKEN} in the config. What you should never do is use string interpolation in a CI template that bakes secrets into the rendered config file and then stores the artifact. I've seen that exact mistake cause a credential rotation incident. Keep the config template in git with placeholder syntax, and let your secret manager or k8s Secret handle injection at runtime. Run vector validate --config vector.toml in CI against a config with dummy env vars exported — it catches syntax errors without needing real credentials.


Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.


Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.

Source: dev.to

arrow_back Back to Tutorials