BoxAgnts Tool System (3) — The Complete Chain of Tool Registration and Hot Reloading

Tool registration sounds like a lightweight module — scan directories, read files, fill a hash table. But doing it right and doing it reliably requires handling encoding detection, text parsing, race conditions, and startup performance — problems that aren't obvious at first glance. This article traces the complete chain from a .wasm file to an AI-callable tool, breaking down each step.

The Problems Registration Must Solve

Let's be clear about what this module needs to accomplish. Once a .wasm file is placed in the extensions directory, the system needs to know:

What its name is
What parameters it has, their types, and whether each is required
What permission level it belongs to
What its functional description is (for AI model call decisions)
What its keywords are (for AI model search)

The traditional approach is to have the developer provide a JSON Schema file alongside the .wasm. This approach has synchronization problems: the Schema says a parameter is string but the code treats it as number; the Schema wasn't updated but the tool already gained new parameters; the Schema has errors but the tool registers successfully and then fails forever on execution. Plus, to prepare this Schema, the developer has to additionally understand BoxAgnts' Schema format.

BoxAgnts' approach changes the Schema source from "manually written" to "tool self-described" — directly execute the WASM tool, pass --help, and parse the help text it prints. This means tool developers only need to follow standard CLI program conventions, using any language's CLI argument parsing library (Rust's clap, Go's cobra, Python's argparse) to define parameters, and BoxAgnts extracts everything automatically.

Encoding Detection

The first technical detail comes after reading stdout. A WASM tool's --help output is a byte stream, not a string — you need to detect the encoding before decoding. If you assume UTF-8 blindly, tools encoded in GBK or Shift-JIS will fail to parse.

BoxAgnts uses chardetng for encoding detection:

// wasm-tools/src/decode.rs
pub fn decode_bytes(bytes: Bytes) -> (String, &'static str, bool) {
    let mut detector = chardetng::EncodingDetector::new(
        chardetng::Iso2022JpDetection::Allow
    );
    detector.feed(&bytes, true);
    let encoding = detector.guess(None, chardetng::Utf8Detection::Allow);
    let (cow, _, had_errors) = encoding.decode(&bytes);
    (cow.into_owned(), encoding.name(), had_errors)
}

chardetng is an encoding detection library developed by Mozilla, used by Firefox for automatic webpage encoding detection. It has very high accuracy for short texts (--help output is typically no more than a few KB). Iso2022JpDetection::Allow enables ISO-2022-JP detection for WASM tools from Japanese environments; Utf8Detection::Allow validates UTF-8 integrity to avoid misclassifying random binary data as valid text.

After decoding, three items are returned: the string, the encoding name, and whether there were decoding errors. The subsequent parser receives clean UTF-8 text.

The Help Text Parser

Parsing --help output is not straightforward. Different CLI libraries produce output in different formats: clap's --help and -h differ in detail level (the former includes long_about, the latter only about); some libraries have inconsistent indentation between Options: and Arguments: blocks; subcommands may appear under either Commands: or Subcommands: headings.

BoxAgnts' parser, located in wasm-tools/src/registry/parser.rs, follows this flow:

1. Fetch Two Help Texts

pub async fn fetch_help_texts(program: &str) -> Result<HelpTextPair> {
    let short_candidates = vec![vec!["-h"], vec!["--help"]];
    let long_candidates = vec![vec!["--help"], vec!["-h"]];
    let short_help = run_first_help_candidate(program, &short_candidates).await?;
    let long_help = run_first_help_candidate(program, &long_candidates).await?;
    Ok(HelpTextPair { short_help, long_help })
}

Why two copies? Because many CLI programs produce different output for -h (short help) and --help (long help). -h may only list parameter names with one-line descriptions, while --help includes more detailed long descriptions (long_about). BoxAgnts merges both:

Tool name and version extracted from short help (most compact and reliable format)
Long description (long_about), keywords (Keywords:), and permission level (PermissionLevel:) taken preferentially from long help
Parameter list (properties) and required items (required) merged from both — long help as primary, short help as supplementary

2. Validate Output Legitimacy

Not every WASM program qualifies as a tool. run_first_help_candidate performs legitimacy checks after receiving output:

pub fn looks_like_help_output(text: &str) -> bool {
    let has_usage = text.lines().any(|l| l.trim_start().starts_with("Usage:"));
    let has_options = text.lines().any(|l| l.trim() == "Options:");
    let has_arguments = text.lines().any(|l| l.trim() == "Arguments:");
    let has_commands = text.lines().any(|l| {
        let t = l.trim();
        t == "Commands:" || t == "Subcommands:"
    });
    has_usage || has_options || has_arguments || has_commands
}

The output must contain at least one of Usage:, Options:, Arguments:, or Commands: block headers. If a WASM program's --help output doesn't include these — for example, if it's an HTTP server rather than a CLI tool — the parser rejects registration and logs an error.

3. Field-by-Field Extraction

fn parse_help_text(help: &str) -> Result<ParsedHelp> {
    let lines: Vec<&str> = help.lines().collect();

    let (name, version) = parse_name_version(lines[0])?;
    // First line format: "base64 1.0.0" → name="base64", version="1.0.0"

    let about = lines.iter().skip(1)
        .find(|l| !l.trim().is_empty())
        .ok_or("missing about line")?
        .trim().to_string();

    let keywords = extract_single_line_field(help, "Keywords:");
    let permission_level = extract_single_line_field(help, "PermissionLevel:");

    let properties = parse_options_section(&lines)?;    // Options: block
    let (arg_props, arg_required) = parse_arguments_section(&lines)?;  // Arguments: block
    let commands = parse_commands_section(&lines)?;     // Commands: block
    // ...
}

The core of parameter parsing lies in two functions:

parse_options_section: Locate the Options: line; each subsequent line is an option definition (in --mode <MODE> or -m, --mode <MODE> format). Extract parameter name, type (from <TYPE>), and description (free text at end of line).
parse_arguments_section: Locate the Arguments: line; positional parameters in <NAME> format, with square brackets indicating optional.

Both functions use regex matching. The former's pattern is --([a-zA-Z][a-zA-Z0-9_-]*) with optional <TYPE> angle brackets; the latter matches <([a-zA-Z][a-zA-Z0-9_-]*)> and determines optionality from the presence of [ around it.

4. Merge and Deduplicate

merge_required combines the required parameter lists extracted from -h and --help:

fn merge_required(short: &[String], long: &[String]) -> Vec<String> {
    let mut merged = Vec::new();
    for item in short.iter().chain(long.iter()) {
        if !merged.contains(item) {
            merged.push(item.clone());
        }
    }
    merged
}

Similarly, properties from both sources are merged — long help's entries override short help's same-named entries (since long help descriptions are more detailed).

The final product is ToolSpec:

pub struct ToolSpec {
    pub name: String,
    pub wasm_file: String,
    pub about: String,
    pub long_about: String,
    pub keywords: String,
    pub permission_level: String,
    pub version: String,
    pub input_schema: InputSchema,    // type: "object" + properties + required
    pub commands: Vec<CommandSpec>,
}

Hot Reloading and Concurrency Safety

Tool registration isn't a one-time thing. Users may add, overwrite, or delete .wasm files in the extensions directory at any time. BoxAgnts uses the notify crate for filesystem monitoring:

let _ = start_watcher(workspace_extensions_dir.join("tools")).await;
let _ = start_watcher(app_extensions_dir.join("tools")).await;

start_watcher internally creates a tokio task that loops, receiving filesystem events. The handling logic for arriving events looks like this:

notify::Event::Create(path) | Event::Modify(path)
  │ path ends with .wasm?
  ├── Yes → execute wasm-sandbox::run::execute(path, ["--help"]) → parse → update HashMap
  └── No  → ignore

notify::Event::Remove(path)
  │ path ends with .wasm?
  ├── Yes → HashMap.remove(tool_name)
  └── No  → ignore

The HashMap itself is protected by tokio::sync::RwLock:

static WASM_TOOLS: Lazy<RwLock<HashMap<String, ToolSpec>>> =
    Lazy::new(|| RwLock::new(HashMap::new()));

RwLock allows multiple concurrent reads (tool invocations) and one exclusive write (hot-reload updates). Since tool list update frequency is very low (writes are almost exclusively triggered by manual user operations), read-write lock contention costs are negligible.

An edge case: what happens if, while the file watcher is parsing a new tool, an AI conversation happens to request the tool list? The answer is that no special handling is needed — all_tools() holds an RwLock read lock, the parser needs a write lock, and the write lock waits for the read lock to release. From the user's perspective, the delay is imperceptible — all_tools()'s read lock hold time is merely the duration of one HashMap traversal (microsecond scale), causing no noticeable blocking.

Compilation Caching

There's an implicit performance optimization during registration. The first time a .wasm file is encountered, parse_wasm_tool() not only executes it in the sandbox to capture --help output, but also triggers Wasmtime precompilation:

// compiler.rs
pub fn process(wasm_file: &str, cache_dir: &str) -> Result<PathBuf> {
    let cache_file = dir.join(cache_file_name);
    if cache_file.exists() {
        return Ok(cache_file);  // cache hit
    }
    // Wasmtime CodeBuilder compilation, outputs .cwasm
    let output_bytes = code.compile_component_serialized()?;
    std::fs::write(&cache_file, output_bytes)?;
    Ok(cache_file)
}

.cwasm is Wasmtime's precompiled format (compiled WebAssembly). Subsequent actual tool invocations load it directly, skipping the parsing and compilation phases. For larger WASM tools (e.g., sqlite-component.wasm, which includes a SQLite engine and can produce .cwasm files several MB in size), this cache can compress the first tool invocation latency from hundreds of milliseconds down to a few milliseconds.

The cache key is based on a hash of the WASM file's content, not the filename. This means updating .wasm file content automatically triggers recompilation — no stale cache issues.

Tool Search

As the registry grows, the AI model needs a way to discover the tools it needs — you can't shove every tool's Schema into the system Prompt (token costs are too high). ToolSearchTool provides keyword-based retrieval:

struct ToolEntry {
    name: String,
    description: String,
    keywords: Vec<String>,
}

Search supports exact lookup ("select:ToolName") and fuzzy matching (relevance scoring by name and keywords). The scoring algorithm is straightforward: exact name match has the highest weight, keyword inclusion next, description inclusion lowest. This is sufficient for scenarios with dozens to hundreds of tools. If larger-scale support is needed (thousands of tools), vector search can be substituted — the interface remains unchanged, only the scoring implementation changes.

Differences from Peer Approaches

Many Agent frameworks require pre-registering all tools (in Python code: tool = Tool(name=..., func=..., description=...)). BoxAgnts' model eliminates this step. An additional benefit is that the deployment workflow is simplified to the extreme: developer writes and compiles the tool → scp to the server's extensions directory → done. No configuration file modifications, no service restarts, no API registration calls.

This design is especially friendly for CI/CD scenarios — you can put the tool compilation step in GitHub Actions, with build artifacts automatically deployed to the server running BoxAgnts. The moment deployment completes, the AI can call the new tool.

Summary

BoxAgnts' tool registration mechanism solves the core problem of Schema-code inconsistency inherent in traditional approaches through three components:

Encoding detection (chardetng) eliminates the parser's hardcoded UTF-8 assumption, enabling correct registration of WASM tools produced in any language environment.
Dual help text merging (-h and --help) compensates for differences in output detail across CLI libraries. -h provides reliable name/version, --help provides detailed long_about and parameter descriptions; merging both yields the complete ToolSpec.
Content-based compilation caching precompiles WASM tools to .cwasm at registration time; subsequent calls skip the compilation phase, reducing latency from hundreds of milliseconds to single-digit milliseconds. The cache key is a content hash, not the filename, so updating tool content automatically triggers recompilation.

The hot-reload RwLock design finds an appropriate balance between concurrency safety (many reads, single write) and implementation complexity. The complete chain of notify event monitoring → HashMap update → compilation caching forms the technical foundation of BoxAgnts' "zero-configuration deployment" — after a developer copies a .wasm file to the extensions directory, the system automatically completes all steps from registration to availability.

References

BoxAgnts source code: https://github.com/guyoung/boxagnts
chardetng encoding detection library: https://github.com/hsivonen/chardetng
notify (Rust filesystem watcher): https://github.com/notify-rs/notify
Wasmtime precompilation cache documentation: https://docs.wasmtime.dev/cli-cache.html
clap (Rust CLI argument parser): https://github.com/clap-rs/clap
cobra (Go CLI argument parser): https://github.com/spf13/cobra