Your service runs fine at 2 PM.
At 6 PM, the database experiences a brief latency spike—nothing catastrophic, maybe 200ms slower than usual. Within minutes, your monitoring alerts start lighting up. Memory usage climbs 40%, then 60%. GC pauses increase. Users start timing out.
By 7 PM, you have an OutOfMemoryError.
You check the logs. Nothing unusual. The database recovered. The CPU is fine. The network is fine. So what killed you?
An unbounded queue in your ThreadPoolExecutor.
This is one of those bugs that feels like it shouldn't exist in 2026. It's well-known, thoroughly documented, yet somehow still sneaks past code reviews and deploys into production. The reason is simple: unbounded queues feel safe at first.
You define a thread pool with 10 threads, and you assume the queue is your safety net. When threads are busy, tasks wait. Seems reasonable. Until the queue has 100,000 tasks in it.
The Deceptive Logic of Unbounded Queues
Here's how the trap works:
ExecutorService executor = new ThreadPoolExecutor(
10, // corePoolSize
10, // maxPoolSize
60, TimeUnit.SECONDS, // keepAliveTime
new LinkedBlockingQueue<>() // ← DANGER: unbounded queue
);
You've created a thread pool with 10 threads. When all 10 threads are busy, new tasks don't get rejected—they get queued. The queue can hold unlimited tasks.
Your mental model: "Threads are busy, tasks queue up, threads finish, queue drains."
The reality during latency: "Threads are busy waiting on slow database calls, tasks keep arriving and queueing, queue grows indefinitely, memory fills up, GC panics, everything crashes."
The core issue is that a queue is not a buffer—it's a pit.
A buffer should have boundaries. It should say "I can hold X items, then I stop accepting more." A queue with no bounds just keeps taking items until your JVM runs out of memory.
When This Goes Wrong
The sneaky part is that unbounded queues don't cause problems under normal load. They cause problems under the exact circumstances when you most need protection:
Scenario 1: Temporary Latency Spike
Your database experiences a brief slowdown. Queries that normally take 10ms now take 500ms. Your 10-thread pool fills up as threads block waiting for results.
Time 0:
Thread 1-10: Processing requests (blocking on DB)
Queue: Empty
Time 1 (DB latency spike):
Thread 1-10: Still waiting for DB responses
Queue: 100 pending requests
Time 2:
Thread 1-10: Still waiting
Queue: 1,000 pending requests
Time 3:
Thread 1-10: Finally getting responses back
Queue: 5,000 pending requests and growing
Time 5:
Thread 1-10: Working through the backlog
Queue: 100,000+ pending requests
Memory: 2GB (heap is 4GB)
Time 7:
Queue: OutOfMemoryError. Your service dies.
The database recovered at Time 3. But your service didn't. It spent the next 4 minutes executing stale work for requests that already timed out on the client side.
Scenario 2: Cascading Failures
Service A depends on Service B. Service B starts degrading. Service A's thread pool queues up requests waiting for responses. The queue grows. Memory spikes. Service A crashes, adding more load to Service B, which makes other services queuing up requests to Service B, which...
This is how a cascading failure happens. One slow service takes down three others.
The Root Cause: The Acceptance vs. Execution Mismatch
Here's the fundamental problem:
A ThreadPoolExecutor has two knobs:
- Core threads — threads that always exist
- Queue — where tasks wait when threads are busy
The issue: When the queue is unbounded, the executor accepts every single task regardless of capacity. Your thread pool can't say "no, I'm overloaded, reject this task." It just queues it.
This creates a mismatch between accepting work and executing work.
Request comes in → Added to unlimited queue → Task waits → More requests come in
↓
Task still waiting
Memory growing
GC struggling
JVM dying
The executor accepted the work (queued it), but never had capacity to execute it.
The Solution: Bounded Queues and Backpressure
The fix is to give your queue a hard limit:
ExecutorService executor = new ThreadPoolExecutor(
10, // corePoolSize
20, // maxPoolSize (grow if needed)
60, TimeUnit.SECONDS, // keepAliveTime
new LinkedBlockingQueue<>(1000), // ← BOUNDED: max 1000 tasks
new ThreadPoolExecutor.CallerRunsPolicy() // Rejection policy
);
Now, when the queue reaches 1000 items, what happens to the 1001st task? It gets rejected.
By default, a rejected task throws a RejectedExecutionException. But that seems harsh—you want your service to degrade gracefully, not crash.
This is where rejection policies come in:
Rejection Policy 1: CallerRunsPolicy (My Favorite)
new ThreadPoolExecutor.CallerRunsPolicy()
When the queue is full, instead of rejecting the task, execute it in the caller's thread. This creates natural backpressure—the caller slows down, which slows down the request ingestion, which protects the thread pool.
Effect: Your API becomes slower under load instead of crashing. Users experience timeouts, not 503s.
Incoming requests → Thread pool queue (1000 items) → FULL
↓
CallerRunsPolicy: Run in caller thread
↓
Caller gets blocked → Slows down ingestion
↓
Natural backpressure applied
Rejection Policy 2: AbortPolicy (Explicit Failure)
new ThreadPoolExecutor.AbortPolicy()
Throw an exception. The caller knows immediately that the system is overloaded. They can retry, circuit-break, or fail fast.
try {
executor.submit(task);
} catch (RejectedExecutionException e) {
log.warn("Executor is overloaded, backing off");
return new ServiceUnavailableResponse();
}
This is more explicit but requires the caller to handle the rejection.
Rejection Policy 3: DiscardPolicy (Nuclear Option)
new ThreadPoolExecutor.DiscardPolicy()
Silently drop the task. Use this only for non-critical work where loss is acceptable (e.g., metrics collection, logging).
Tuning the Queue Capacity
How big should your queue be?
This is where it gets nuanced. Too small, and you reject valid requests during normal fluctuations. Too big, and you're back to the original problem.
A practical approach:
// Estimate based on your latency and throughput
int estimatedQueueSize = (int) (
averageRequestsPerSecond * // 1000 req/s
maxAcceptableLatencySeconds // 10 second wait
);
// Conservative estimate: 10,000 tasks
// This gives you 10 seconds of buffer at 1000 req/s
ExecutorService executor = new ThreadPoolExecutor(
10,
20,
60, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(estimatedQueueSize),
new CallerRunsPolicy()
);
Then monitor in production:
- If queue hits capacity regularly, increase it (or increase core threads)
- If queue rarely exceeds 10% capacity, you can reduce it
A Deeper Problem: Stale Task Execution
Even with bounded queues, there's another issue: stale tasks still get executed.
When a client times out waiting for a response, they've given up. But their task might still be sitting in the queue, waiting to execute. Hours later, when the queue drains, the thread pool dutifully executes it.
This is wasted work—your thread pool is doing something nobody cares about anymore.
One partial solution: Use Future with timeouts:
ExecutorService executor = Executors.newFixedThreadPool(10);
Future<Response> future = executor.submit(() -> {
return expensiveOperation();
});
try {
Response response = future.get(5, TimeUnit.SECONDS);
return response;
} catch (TimeoutException e) {
future.cancel(true); // Cancel the task
return timeoutResponse();
}
The cancel(true) flag attempts to interrupt the task. But this only works if your task respects interrupts. Many database drivers don't.
A better approach: Pass a deadline or timeout token to your task itself:
@FunctionalInterface
interface TimeoutAwareTask<T> {
T execute(Deadline deadline) throws InterruptedException;
}
executor.submit(() -> {
Deadline deadline = Deadline.ofMillis(System.currentTimeMillis() + 5000);
if (deadline.isExpired()) {
log.debug("Task timed out before execution, skipping");
return;
}
expensiveOperation(deadline);
});
Now your task knows when the client gave up and can bail out early.
Additional Improvements
Beyond bounded queues, consider:
1. Thread Pool Size Optimization
// CPU-bound work
int coreThreads = Runtime.getRuntime().availableProcessors();
// IO-bound work (database calls, network)
int coreThreads = Runtime.getRuntime().availableProcessors() * 2;
IO-bound tasks spend time waiting, so more threads are useful.
2. Separate Thread Pools by Workload
Don't use one executor for everything. Separate concerns:
// Fast, non-blocking work
ExecutorService fastExecutor = new ThreadPoolExecutor(
10, 10, 60, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(1000),
new CallerRunsPolicy()
);
// Slow IO work (database queries)
ExecutorService slowExecutor = new ThreadPoolExecutor(
20, 20, 60, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(5000),
new CallerRunsPolicy()
);
One slow dependency doesn't starve the fast paths.
3. Spring Boot Configuration (If You Use Spring)
spring.task.execution.pool.core-size: 10
spring.task.execution.pool.max-size: 20
spring.task.execution.pool.queue-capacity: 1000
spring.task.execution.pool.allow-core-thread-timeout: true
Add a custom executor bean for control:
@Configuration
public class ExecutorConfig {
@Bean(name = "taskExecutor")
public ThreadPoolTaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(20);
executor.setQueueCapacity(1000);
executor.setRejectedExecutionHandler(
new ThreadPoolTaskExecutor.CallerRunsPolicy()
);
executor.initialize();
return executor;
}
}
4. Monitoring and Alerting
Track these metrics religiously:
@Component
public class ExecutorMetrics {
@Scheduled(fixedRate = 5000)
public void logExecutorStats(ThreadPoolTaskExecutor executor) {
log.info(
"Executor stats - Active: {}, Queue: {}, Completed: {}",
executor.getActiveCount(),
executor.getThreadPoolExecutor().getQueue().size(),
executor.getThreadPoolExecutor().getCompletedTaskCount()
);
}
}
Alert if:
- Queue size consistently > 50% of capacity
- Active threads = maxPoolSize (means you're at capacity)
- Task rejection rate increases
The Mindset Shift
Here's what separates services that crash under load from those that degrade gracefully:
Dangerous mindset: "I'll use an unbounded queue as a buffer."
Safe mindset: "My queue has a limit. When I hit that limit, I stop accepting new work and return to the caller that I'm overloaded."
The second approach feels harsh—you're rejecting requests. But that's better than crashing. A rejection is honest; a crash is a lie.
One More Thing
Thread pool tuning is empirical, not theoretical. The "perfect" size for your executor depends on your latency profile, your hardware, and your workload.
Start with bounded queues and reasonable defaults. Deploy. Monitor. Adjust based on production behavior.
And if you see memory climbing during a latency spike, you already know what to look for: check your executor configuration. Odds are, somewhere there's an unbounded queue quietly queuing up requests until your JVM runs out of memory.
The fix is simple. The prevention is simpler. The cost of not doing it is expensive.