Stop Blocking Virtual Threads: Building Asynchronous Human-in-the-Loop AI Agents with Spring AI
In 2026, letting autonomous AI agents execute high-risk enterprise tools without human oversight is a production liability, but blocking platform threads—or even Project Loom’s virtual threads—for hours waiting for a manager's Slack approval is absolute architectural malpractice. We must transition from synchronous execution loops to stateless, event-driven agent hydration where the LLM's reasoning state is serialized and persisted during human-in-the-loop (HITL) interrupts.
Why Most Developers Get This Wrong
- Virtual Thread Abuse: Thinking Virtual Threads (
VirtualThreadExecutor) solve the wait problem—they do not; holding resources open for a 4-hour human coffee break destroys system scalability and ruins connection pools. - State-in-Memory Antipattern: Storing the active ReAct loop state (like active
ChatMemoryor agent context) in local heap memory, making your system highly vulnerable to redeployments and node failures. - Polled-Waiting Loops: Using
CompletableFutureor busy-waiting database polling loops to check if a human has clicked "Approve" on an external UI.
The Right Way
The clean solution is to serialize the agent's execution state—the ReAct loop token history, tool call IDs, and pending variables—to a persistent store, terminate the active thread immediately, and hydrate a brand-new agent instance when the approval webhook fires.
- Explicit Interrupt Exceptions: Throw a specialized
AgentSuspensionExceptioncontaining the serializedstateIdand tool execution metadata when a high-risk tool is triggered. - State Hydration: Use Spring AI's
ChatClientwith a custom Redis-backedChatMemoryimplementation that supports snapshotting at specific message indices. - Asynchronous Resumption: Expose a stateless REST endpoint
/api/v1/agent/resumethat accepts the human decision, merges it into the serialized history as aToolResponseMessage, and triggers the next step of the ReAct loop.
Show Me The Code
@PostMapping("/agent/resume")
public ResponseEntity<String> resumeAgent(@RequestBody ApprovalResponse approval) {
// 1. Retrieve serialized chat history (ReAct state) from Redis
List<Message> history = stateRepository.findById(approval.stateId());
// 2. Inject the human's decision as if it were the tool's output
String toolOutput = approval.approved() ? "Approved: " + approval.notes() : "Rejected by human";
history.add(new ToolResponseMessage(approval.toolCallId(), toolOutput));
// 3. Hydrate the agent and resume execution without blocking threads
ChatResponse response = chatClient.prompt()
.messages(history)
.call()
.chatResponse();
return ResponseEntity.ok(response.getResult().getOutput().getContent());
}
Key Takeaways
- Never block on humans: Treat human approvals as asynchronous, event-driven inputs, not long-lived synchronous I/O operations.
- Serialize the prompt history: Store the exact LLM prompt/response state to Redis or Postgres to ensure your agents are completely stateless between tool calls.
- Leverage Spring AI's modularity: Use custom
ChatMemoryadapters to dynamically hydrate and dehydrate context windows on demand.
Heads up: if you want to see these patterns applied to real interview problems, javalld.com has full machine coding solutions with traces.