BoxAgnts Tool System (4) — The Tool Trait and Concurrency Context Model

The reason BoxAgnts' tool system can uniformly manage three completely different execution entities — Rust built-in functions, WASM sandbox components, and cron task triggers — comes down to a six-method Trait plus a shared-context concurrency model. This article dissects the implementation and design considerations of both.

Why the Trait Method Signatures Are Written This Way

Let's review the Tool trait:

#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &'static str;
    fn description(&self) -> &'static str;
    fn source(&self) -> ToolSource;
    fn permission_level(&self) -> PermissionLevel;
    fn input_schema(&self) -> Value;
    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult;
}

The first noteworthy detail is the return type of name() and description(): &'static str. For Rust built-in tools, this is natural — string literals are placed in the binary's .rodata section at compile time, inherently possessing a 'static lifetime. But for WASM tools, name and description are Strings parsed from help text at runtime; they don't have a 'static lifetime.

The solution is Box::leak:

// wasm-tools/src/wasm_tool.rs
fn name(&self) -> &'static str {
    Box::leak(self.name.clone().into_boxed_str())
}

Box::leak returns a reference to the Box<str> to the caller and tells the compiler to relinquish ownership of this memory — this memory "leaks" and will never be freed. For strings like tool names and descriptions that need to be accessible throughout the program's entire lifetime, this is the correct trade-off. The total memory leaked for a few strings is at most a few hundred bytes, well within acceptable limits.

Of course, if BoxAgnts supported frequent addition and removal of WASM tools (rather than only at startup and during manual operations), Box::leak could accumulate non-negligible memory usage. The current design assumes tool registration is a low-frequency operation, making this trade-off reasonable.

permission_level() returns PermissionLevel as an enum rather than a bitmask. This is deliberate — permission levels are linearly increasing (None < ReadOnly < Write < Execute); there is no "simultaneously ReadOnly + Write + Execute" combinatorial semantics. If extension to a more complex permission model is needed (e.g., Capability-level fine-grained control), it could be changed to a HashSet<Capability>, but the current four-level linear model is sufficient for CLI tool permission descriptions.

ToolContext Ownership Design

execute()'s signature is async fn execute(&self, input: Value, ctx: &ToolContext) — note that ctx is an immutable reference. This means a tool cannot modify the shared context during execution. This constraint comes from Rust's borrow rules, not from runtime checks.

Let's look at what's inside ToolContext:

pub struct ToolContext {
    pub permission_mode: PermissionMode,
    pub cost_tracker: Arc<CostTracker>,
    pub session_id: Option<String>,
    pub current_turn: Arc<AtomicUsize>,
    pub non_interactive: bool,
    pub config: Config,
    pub managed_agent_config: Option<ManagedAgentConfig>,
    pub allowed_outbound_hosts: Vec<String>,
    pub block_url: Option<String>,
}

cost_tracker and current_turn are wrapped in Arc because they are mutable state that multiple concurrently executing tools need to share. Arc<AtomicUsize> guarantees that current_turn's atomic increment doesn't need a lock — under tokio's multi-threaded scheduler, AtomicUsize operations use CPU atomic instructions (lock inc on x86), one to two orders of magnitude faster than Mutex.

CostTracker follows the same pattern, internally using AtomicF64 (provided by the atomic crate; AtomicF64 is not yet stabilized in the standard library) to track cumulative costs.

The config field is a Cloned copy of the full configuration object — its data volume is small (a few KB) and it won't be modified during tool execution, so direct Clone is simpler than wrapping in Arc, saving one dereference overhead.

allowed_outbound_hosts is Vec<String> rather than Arc<Vec<String>> or &[String]. The reason is that WASM tools need to acquire full ownership copies during execution to construct RunOption (which itself needs to be passed to Wasmtime's WasiCtx internally), so there's no reason to keep a reference — directly Clone and move in.

ToolResult and Structured Output

pub struct ToolResult {
    pub content: String,
    pub is_error: bool,
    pub metadata: Option<Value>,
}

is_error is not Rust's Result — it marks success/failure at the AI level, not the Rust program level. A WASM tool may run successfully in the sandbox (Rust-level Ok), but its output indicates a failed operation (e.g., file-read tried to read a non-existent file). The AI model needs to see is_error: true to decide whether to retry or report to the user. Without this field, the AI can't distinguish "technical errors" from "business failures."

metadata is an escape hatch, allowing tools to return rich structured information like Markdown tables, diff data, chart configurations for frontend rendering. Usage:

ToolResult::success("File contents:\n...")
    .with_metadata(json!({
        "lines": 42,
        "language": "rust",
        "diff_stats": {"added": 15, "removed": 3}
    }))

When the frontend receives this ToolResult, if metadata contains a language field, it renders the code block using CodeMirror's highlighting mode; if it contains diff_stats, it renders a diff view. Tool developers don't need to worry about rendering details — they only need to provide structured data.

WASM Tool execute Implementation

WasmTool's execute() has an additional conversion layer compared to built-in tools: converting AI-generated JSON parameters to CLI arguments:

async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult {
    let args = value_to_cli_args(input);      // {"mode":"encode","input":"hello"}
                                               // → ["--mode","encode","--input","hello"]

    let mut options = RunOption::default();
    options.work_dir = Some(ctx.get_work_dir());
    options.allowed_outbound_hosts = Some(ctx.allowed_outbound_hosts.clone());
    options.block_url = ctx.block_url.clone();
    options.wasm_cache_dir = Some(ctx.get_app_cache_dir());

    let result = wasm_sandbox::run::execute(
        self.wasm_file.clone(), None, Some(args), options, None
    ).await;

    match result {
        Ok((stdout, stderr)) => {
            let output = decode::decode_bytes(stdout);
            // Try JSON parsing — if WASM tool returns {"error":false,"content":"..."}
            match serde_json::from_str::<Value>(&output) {
                Ok(Value::Object(map)) => {
                    // Map to ToolResult's is_error, content, metadata
                }
                _ => ToolResult::success(output) // Non-JSON output, entire text as content
            }
        }
        Err(e) => ToolResult::error(format!("{:?}", e)),
    }
}

There's an edge case in the JSON mapping: if a WASM tool returns {"content": "some text", "metadata": {...}}, BoxAgnts automatically maps it to ToolResult { is_error: false, content: "some text", metadata: Some(...) }. If "error": true is included, is_error is set to true. This convention allows WASM developers to output plain text (simple scenarios) or structured JSON (when metadata is needed).

Characteristics of Built-in Tools

For comparison, here's BriefTool — its implementation is under 50 lines:

impl Tool for BriefTool {
    fn name(&self) -> &str { "brief" }
    fn description(&self) -> &str { "Send a formatted message to the user" }
    fn source(&self) -> ToolSource { ToolSource::BuiltIn }
    fn permission_level(&self) -> PermissionLevel { PermissionLevel::None }

    fn input_schema(&self) -> Value {
        json!({
            "type": "object",
            "properties": {
                "message": {
                    "type": "string",
                    "description": "The message to send"
                },
                "format": {
                    "type": "string",
                    "enum": ["text", "markdown"]
                }
            },
            "required": ["message"]
        })
    }

    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult {
        let params: BriefInput = serde_json::from_value(input)?;
        let formatted = match params.format.as_deref() {
            Some("markdown") => render_markdown(&params.message),
            _ => params.message.clone(),
        };
        ToolResult::success(formatted)
    }
}

The core difference from WASM tools is in performance characteristics. Built-in tools have no sandbox startup overhead — execute() directly calls a Rust function; the latency from entry to the first logic instruction is nanosecond-scale. WASM tools, even with .cwasm caching, have latency from tokio task scheduling to Wasmtime component initialization in the microsecond range. For small text read/write operations of a few dozen KB, this difference is negligible; but for high-frequency operations requiring sub-microsecond response (e.g., the AI repeatedly calling the same tool in a loop for parameter scanning), built-in tools have a clear advantage.

The Unified Dispatch Entry Point

build_tools_with_mcp() in gateway/src/api/tool.rs (note: the filename preserves historical naming; this function is actually build_all_tools) merges all tools into a single Arc<Vec<Arc<dyn Tool>>>:

pub async fn build_all_tools() -> Arc<Vec<Arc<dyn Tool>>> {
    let mut v = boxagnts_tools_manager::all_tools().await;
    // Extension point: external tool protocols can be connected here in the future
    // if let Some(manager) = &mcp_manager { ... }
    Arc::new(v)
}

Returning Arc<Vec<Arc<dyn Tool>>> rather than Vec<Arc<dyn Tool>> is because the same tool list may be referenced by multiple concurrent Agent conversations. Each conversation needs access to the complete tool list (for permission checking and matching ToolUse requests) but doesn't need an independent copy (the list contents don't change during a conversation). Two layers of Arc — outer layer shares the list itself, inner layer shares each tool instance — avoiding any data duplication.

How to Add a New Tool

From a developer's perspective, the steps for adding a tool are remarkably concise:

Rust built-in tool:

Create a new module under tools/src/, implement the Tool trait
Add one line Arc::new(MyTool) in bundled_tools() in tools-manager/src/lib.rs
Compile the project

WASM extension tool:

Write a CLI program in any language, ensuring --help output follows the convention format
Compile to the wasm32-wasip2 target
Place the .wasm file in the extensions/tools/ directory
Done. No source code changes to BoxAgnts required.

This "source-level" and "file-level" dual registration channel design ensures both tight integration for built-in core tools (performance, type safety) and openness for the extension ecosystem (any language, zero-configuration deployment).

Summary

The Tool trait's six methods form the unified abstraction layer of BoxAgnts' tool system, solving the core engineering problem of "how to hide three fundamentally different execution entities — Rust functions, WASM components, and cron tasks — behind a single interface."

Key design decisions:

name() and description() return &'static str, using Box::leak to convert runtime-parsed WASM tool metadata to a static lifetime. For a tool system with low-frequency registration, leaking a few hundred bytes is an acceptable trade-off.
ToolContext uses Arc<AtomicUsize> and Arc<CostTracker> for lock-free shared mutable state — AtomicUsize's fetch_add is a single lock inc instruction on x86, one to two orders of magnitude faster than Mutex. &ToolContext's immutable borrow guarantees that tools cannot modify shared context; this guarantee comes from the compiler, not from runtime checks.
Two-layer Arc (Arc<Vec<Arc<dyn Tool>>>) shares the tool list at the outer layer and tool instances at the inner layer, avoiding data duplication in multi-Agent concurrency scenarios.
ToolResult.metadata provides a structured channel for frontend rendering — tool developers only need to supply JSON metadata; the frontend renders the corresponding view components by convention.

References

BoxAgnts source code: https://github.com/guyoung/boxagnts
Rust async-trait documentation: https://docs.rs/async-trait
atomic crate (AtomicF64): https://docs.rs/atomic
tokio RwLock documentation: https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html
Box::leak documentation: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.leak