Why Your Go Service Has Latency Spikes (Even If It’s “Fast”)

go dev.to

You shipped a Go service. Benchmarks look great. CPU usage is low. Average latency is comfortably within targets.

And yet every now and then your p99 explodes.

This is the part many engineers underestimate: fast systems can still be unpredictable systems. In Go, latency spikes are rarely caused by a single obvious bottleneck. They emerge from the interaction between the runtime, the OS, and your code under real-world load.

Let’s dig into the less obvious reasons your Go service spikes and what you can actually do about them.


The Illusion of “Fast Enough”

Go makes it easy to build services that are consistently good on average. Goroutines are cheap, the standard library is efficient, and deployment is simple.

But averages lie.

Latency-sensitive systems live and die by tail latency p95, p99, p999. These outliers are where user experience breaks down, SLAs fail, and debugging becomes painful.

If your service is “fast but spiky,” you’re likely dealing with one (or more) of the following.


1. Garbage Collection Isn’t Free (Even When It’s Good)

Go’s garbage collector is excellent, but it is not invisible.

What’s happening

Modern Go uses a concurrent, tri-color mark-and-sweep GC. Most of the work happens alongside your application, but there are still stop-the-world (STW) phases especially during:

  • Stack scanning
  • Mark termination

Even if these pauses are short (microseconds to milliseconds), they can stack up under load and show up as latency spikes.

Why it gets worse in production

  • High allocation rates increase GC frequency
  • Large heaps increase scan time
  • Pointer-heavy data structures slow marking

What to look for

  • Sudden spikes aligned with GC cycles
  • Increased GOGC pressure
  • High allocation profiles in pprof

What actually helps

  • Reduce allocations in hot paths
  • Reuse objects (sync.Pool, carefully)
  • Avoid unnecessary pointers
  • Flatten data structures when possible

2. Goroutine Contention and Scheduler Behavior

Goroutines are cheap but not free, and definitely not magic.

What’s happening

Go’s scheduler multiplexes goroutines onto OS threads. Under load:

  • Run queues grow
  • Context switching increases
  • Work stealing adds overhead

If too many goroutines compete for CPU or locks, latency spikes emerge not from raw compute, but from waiting.

Common traps

  • Spawning unbounded goroutines per request
  • Blocking operations inside goroutines
  • Assuming “more concurrency = faster”

Subtle issue: cooperative preemption

Go relies partly on cooperative preemption. If a goroutine runs tight loops without safe points, it can delay scheduling fairness.

What to do

  • Use worker pools for bounded concurrency
  • Avoid long-running CPU loops without yielding
  • Profile scheduler latency (runtime/trace)

3. Lock Contention: The Silent Killer

Mutexes don’t show up in CPU profiles but they absolutely show up in latency.

What’s happening

Under contention:

  • Goroutines block on locks
  • Queueing delays increase
  • Throughput may remain high, but latency explodes

Where it hides

  • Global maps with mutex protection
  • Shared caches
  • Logging pipelines
  • Metrics collectors

Why it’s tricky

You might not notice until traffic scales. Everything works fine until it suddenly doesn’t.

What works

  • Reduce lock granularity
  • Prefer sharded structures
  • Use lock-free or atomic patterns where appropriate
  • Measure with mutex profiling (go test -mutexprofile)

4. Network and Syscall Variability

Your Go code might be fast. The network is not.

What’s happening

Every request eventually hits:

  • TCP stack
  • DNS resolution
  • Kernel scheduling
  • External services

Even tiny variations here can cascade into visible latency spikes.

Common culprits

  • DNS lookups without caching
  • Connection churn (lack of keep-alives)
  • Slow downstream dependencies
  • Kernel-level queueing

The hidden factor: tail amplification

If your service calls 5 downstream services, each with p99 latency of 50ms, your combined p99 is much worse.

What helps

  • Use connection pooling aggressively
  • Set timeouts everywhere (and mean it)
  • Cache DNS where possible
  • Budget latency across dependencies

5. GC + Scheduler + Syscalls: The Perfect Storm

The real problem is rarely one issue it’s interaction effects.

A typical spike might look like this:

  1. GC cycle starts under high allocation pressure
  2. Goroutines increase due to incoming traffic
  3. Lock contention rises in shared structures
  4. A few slow network calls block threads
  5. Scheduler struggles to keep up

Individually, each is manageable. Together, they create a spike that’s hard to reproduce and harder to debug.


6. Misleading Benchmarks

Your local benchmarks probably didn’t show any of this.

Why?

  • No real network variability
  • No production traffic patterns
  • No contention
  • No long-lived heap growth

Benchmarks measure ideal conditions. Production exposes emergent behavior.


7. Observability Gaps

You can’t fix what you can’t see.

Most teams track:

  • Average latency
  • CPU usage
  • Memory usage

But miss:

  • GC pause distribution
  • Goroutine counts over time
  • Scheduler delays
  • Mutex contention
  • Per-endpoint tail latency

Without these, spikes remain mysterious.


What Actually Works in Practice

If you care about latency consistency, not just speed:

1. Profile under realistic load

Use:

  • pprof (CPU, heap, allocs, mutex)
  • runtime/trace for scheduler insights

2. Track the right metrics

  • p95/p99 latency (not averages)
  • GC pause time
  • Goroutine count
  • Queue lengths

3. Design for bounded behavior

  • Limit concurrency
  • Avoid unbounded queues
  • Apply backpressure

4. Reduce variability, not just cost

  • Stable systems beat “fast on average” systems
  • Predictability > peak performance

Final Thought

Go gives you the tools to build extremely fast systems. But it doesn’t guarantee consistent latency that part is on you.

If your service has latency spikes, don’t look for a single bug. Look for interactions under pressure.

Because in production, the question isn’t:

“Is my service fast?”

It’s:

“Is my service predictable when everything starts going wrong?”

Source: dev.to

arrow_back Back to Tutorials