BoxAgnts Runtime (1) — Runtime Engineering Defines the Future of AI Agents

Over the past year, AI agent projects have flooded the landscape—autonomous coding assistants, browser automation tools, multi-agent orchestration frameworks. Every week brings a new entrant. But an uncomfortable truth remains: most agents still fail frequently in production, and for the same reason—they lack a trustworthy runtime.

The industry has poured enormous effort into optimizing how models think, yet almost no one has focused on how agents execute. This imbalance is becoming increasingly dangerous.

Prompt Engineering Solved the Easy Part

Prompt engineering was indispensable during the early phase of LLM adoption. Techniques like chain-of-thought, ReAct loops, and planning agents dramatically improved reasoning quality.

But they also created a dangerous illusion:

If the model is smart enough, the system is reliable enough.

The moment an agent begins interacting with real systems—reading and writing files, executing commands, calling APIs, operating databases—this assumption collapses. Prompts can influence reasoning, but prompts cannot enforce security boundaries.

The Real Problem Starts at Tool Execution

Most modern AI agents converge on the same architecture:

LLM → Tool Selection → Python Function → Shell / Network / Filesystem

The model decides which tool to call, what arguments to pass, and when to stop. In BoxAgnts, we're clear-eyed about this. Look at the query loop (run_query_loop) in boxagnts/query/src/query.rs—it's the heart of the entire system:

// Core logic of run_query_loop:
// 1. Send conversation to the API
// 2. Process streaming responses
// 3. Detect tool-use requests and dispatch execution
// 4. Feed tool results back to the model
// 5. Handle auto-compaction, token limit recovery, budget overruns, etc.

This loop exposes a fundamental problem: the model becomes the runtime decision engine, but that engine itself is untrustworthy. Prompt injection attacks have proven that LLMs can be manipulated through webpages, documents, tool responses, and retrieved context.

This means an agent cannot safely assume its own reasoning is trustworthy, external content is trustworthy, or tool outputs are trustworthy. Yet many agent runtimes still allow unrestricted execution—an architectural contradiction at its core.

AI Agents Need Runtime Boundaries

Traditional software infrastructure has long assumed that applications will fail, processes will crash, and services may be compromised. That's why we built containers, hypervisors, process isolation, and capability systems.

Curiously, AI agents are moving in the opposite direction—many run with unrestricted shell access, unrestricted filesystem access, and unrestricted network access. This is effectively granting root privileges to an LLM. It is fundamentally incompatible with production-grade infrastructure.

BoxAgnts takes a clear stance on this. Look at the Tool trait in boxagnts/tools/src/tool.rs:

pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    fn permission_level(&self) -> PermissionLevel;
    fn input_schema(&self) -> Value;
    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult;
}

Every tool has an explicit PermissionLevel—ReadOnly, Write, Execute, or None. These aren't decorative labels; they're runtime-enforced constraints. In filter_tools_for_agent, we dynamically trim the available tool set based on the agent's access level (full / read-only / search-only).

The correct approach isn't to make the model more trustworthy—it's to make the runtime resilient against untrusted models. This is a fundamental philosophical shift.

Runtime Engineering Changes the Design Philosophy

Most AI frameworks are workflow-centric: prompt chaining, agent planning, memory management, tool registration. These are useful but ignore runtime behavior. Runtime engineering introduces a completely different set of priorities:

Workflow-Centric Thinking	Runtime-Centric Thinking
How should the model reason?	What is the execution boundary?
Which prompts improve accuracy?	Which capabilities are allowed?
How should tools be chained?	How should tools be isolated?
How is memory persisted?	How is state governed safely?

In BoxAgnts, the runtime is responsible for capability isolation, permission governance, resource constraints, tool lifecycle management, state persistence, and fault containment. The model is just one component within a larger execution system—that's how reliable infrastructure is properly built.

Why Rust Matters for Agent Infrastructure

Most AI tooling is built with Python—reasonable during the experimentation phase. But runtime infrastructure has different requirements: memory safety, deterministic behavior, low overhead, single-binary deployment.

BoxAgnts chose Rust for straightforward engineering reasons. The entire project compiles into a single statically-linked executable. No Python environment setup, no pip install, no virtual environments—download and run.

# Start BoxAgnts
boxagnts --workspace-dir /path/to/workspace --port 30001

More importantly, Rust's ownership semantics align naturally with capability-oriented security philosophy. In boxagnts/wasm-sandbox/src/run.rs, the RunOption struct explicitly defines the boundaries for each WASM execution instance:

pub struct RunOption {
    pub work_dir: Option<String>,          // Filesystem boundary
    pub allowed_outbound_hosts: Option<Vec<String>>,  // Network boundary
    pub wasm_timeout: Option<u32>,         // Time boundary
    pub wasm_max_memory_size: Option<u32>, // Memory boundary
    pub wasm_fuel: Option<u32>,            // Instruction fuel boundary
}

Every resource is explicitly constrained—not coincidental, but a mindset encouraged by Rust's ownership model at the language level.

WebAssembly Is More Than a Deployment Format

Most developers still think of WASM as a frontend technology—that view is outdated. WebAssembly is increasingly becoming a secure execution substrate. For AI agents, this is critical.

WASM tools in BoxAgnts aren't executed via Python subprocesses; they run inside sandboxes through the Wasmtime runtime. boxagnts/wasm-tools/src/wasm_tool.rs shows this pattern:

// WasmTool wraps a WASM module as a unified Tool interface
let result = boxagnts_wasm_sandbox::run::execute(
    wasm_file,     // WASM module path
    None,          // Function name to invoke
    Some(args),    // Arguments
    options,       // Runtime constraints (memory/network/timeout)
    None,
).await;

All WASM tools—file-read, file-write, file-edit, bash, web-fetch, file-glob—execute through the same sandbox runtime. The model cannot directly access the host system; the runtime is the sole gateway.

Multi-Agent Systems Increase the Need for Runtime Isolation

BoxAgnts supports a Managed Agent mode—a Manager handles planning while multiple Executors run in parallel. In boxagnts/query/src/managed_orchestrator.rs, the system prompt explicitly describes this architecture:

You are the MANAGER, responsible for planning and reasoning
in the manager-executor architecture.
You delegate all implementation work to executor agents.
Each executor runs in its own isolated context.

Without runtime isolation, a compromised agent can poison others, context contamination spreads, and capability escalation becomes uncontrollable. That's why every WASM tool execution gets an independent RunOption instance, independent memory space, and independent capability boundaries.

The Next Layer of AI Infrastructure

The current AI stack is heavily focused on the model layer: Model APIs → Prompt Frameworks → Agent Workflows. One layer is still missing underneath: runtime infrastructure—sandboxed execution, capability management, resource governance, persistent state, orchestration control.

BoxAgnts' module architecture reflects the importance of this layer:

boxagnts/
├── api/          ← Model APIs (OpenAI/Anthropic/Google/...)
├── core/         ← Core types and constants
├── gateway/      ← API gateway and Cron scheduling
├── query/        ← Agent query loop and orchestration
├── tools/        ← Tool system and permission model
├── wasm-sandbox/ ← WASM sandbox runtime (Wasmtime)
├── wasm-tools/   ← WASM tool wrappers
├── mcp/          ← MCP protocol client
└── workspace/    ← Workspace and configuration

Note that wasm-sandbox/ sits at the bottom of the infrastructure—beneath tools, everything that executes must pass through it. This isn't a security layer bolted on after the fact; it was embedded into the architecture from the start.

Conclusion

The AI industry currently treats agents primarily as reasoning systems—this framing is incomplete. Agents are execution systems, and execution systems require runtime architecture.

BoxAgnts' open-source practice demonstrates that the future of AI agents isn't about making models smarter—it's about making execution safer. When your agent can read and write files, execute commands, and manipulate infrastructure, the runtime is no longer an implementation detail—it's the foundation of the entire system.

This is a proposition worth serious consideration by every AI infrastructure developer.

Resources

BoxAgnts: https://github.com/guyoung/boxagnts