Capman: Stop Routing Every Intent Through an LLM

typescript dev.to

Most AI apps send every user message to a language model — even when the intent is completely predictable. capman intercepts the 80% you already know how to answer, resolves them in under 2 ms at zero token cost, and lets the LLM handle only the queries that genuinely need it.


The Problem

When a user types "what time is it?" into your AI app, this is what happens:

User input
  → serialize message
  → HTTP request to LLM API     ← 800–1500 ms
  → token cost (~$0.001–$0.01)
  → parse function call JSON
  → invoke time handler
Enter fullscreen mode Exit fullscreen mode

You paid an LLM to decode "what time is it" into a clock call. It knew the answer in 1 ms. You waited 1,200 ms anyway.

Standard function calling routes every query through the model — by design. For ambiguous, open-ended input, that's the right call. For well-defined intents like "check availability", "go to settings", or "get my orders", it's pure overhead.


What is capman?

capman is a TypeScript library that sits in front of your LLM. It reads a capability manifest — a machine-readable list of everything your app can do — and matches user queries against it using weighted keyword scoring.

If it finds a confident match: it resolves the query directly. No LLM. No network. No tokens.
If it doesn't: it escalates to your LLM automatically.

import { CapmanEngine, readManifest } from 'capman'

const engine = new CapmanEngine({
  manifest: readManifest(),
  mode: 'balanced',          // keyword-first, LLM only when needed
  llm: async (prompt) => callYourLLM(prompt),
})

const result = await engine.ask('Check availability for blue jacket')

console.log(result.match.capability?.id)  // 'check_product_availability'
console.log(result.resolvedVia)           // 'keyword' | 'llm' | 'cache'
console.log(result.trace.totalMs)         // 1.6
Enter fullscreen mode Exit fullscreen mode

How It Works

Every engine.ask() call passes through six steps:

1. Cache check — normalized query key lookup. Hit → return in <1 ms.

2. Keyword scorer — every capability in your manifest is scored 0–100 across three weighted sources:

Source Weight Signal
Example sentences 60 pts Strongest — direct intent overlap
Capability description 30 pts Secondary — synonym coverage
Capability name 10 pts Weakest — tiebreaker only

3. Branch decision — confidence ≥ 50%? Direct resolve. Below threshold? LLM escalation (rate-limited + circuit-breaker protected).

4. Privacy enforcementpublic, user_owned, or admin checked before any API call fires.

5. Resolverapi (fetch + retry + timeout), nav (URL template), or hybrid (both in parallel).

6. Learning + trace — full ExecutionTrace returned with step-by-step timing. Pre-boost result recorded to prevent feedback loops.


Three Modes, One Decision

cheap      → keyword only, <2 ms, $0 — for high-volume known intents
balanced   → keyword first, LLM if <50% confidence — default
accurate   → LLM first, keyword fallback — for ambiguous open input
Enter fullscreen mode Exit fullscreen mode

The mode is set per-engine instance. Switch by swapping the constructor option.


capman vs Standard Function Calling

The fundamental difference is when the LLM is involved:

LLM Function Calling capman
Latency 800–1,500 ms <2 ms (direct)
Token cost ~$0.001–$0.01/call $0 on direct match
Failure rate ~25% (hallucinations, timeouts) 0% on direct match
Privacy enforcement Manual Built-in per capability
Caching External setup Memory / File / Combo built-in
Tracing Custom logging needed Automatic on every call

capman doesn't replace your LLM — it protects it. The model gets fewer calls, better-framed prompts, and none of the deterministic work it was doing before.


Demo: The Weather Query

The demo runs a battle test: the same query submitted three times in a row, both engines running in parallel. Here's what came back.

The Query

"what is the weather"
Enter fullscreen mode Exit fullscreen mode

Results

Round 1 — cold start:

  • LLM: 1,151.17 ms ✓ Success → correct tool
  • capman: 1.61 ms — matched "weather" exactly — 715× faster

Round 2 — learning kicks in (+1 pt boost):

  • LLM: 851.51 ms
  • capman: 0.59 ms — boost applied from prior hit — 1,443× faster

Round 3 — boost grows (+2 pts):

  • LLM: 931.02 ms
  • capman: 0.78 ms1,193× faster

Why Does the Latency Drop on Rounds 2 and 3?

The learning store tracks keyword → capability → hitCount. On each successful match, it builds a boost index. On the next call:

boost = min(15, log(hitCount + 1) × 2)
Enter fullscreen mode Exit fullscreen mode

More hits → logarithmically larger boost → capability scores higher faster → match resolves sooner in the scorer loop. The cap is +15 points total. The pre-boost result is what gets recorded (not the post-boost result) — this prevents a feedback loop where boosted winners accumulate hits and permanently displace keyword matches.


A Capability Definition

Everything capman does flows from a structured manifest. Here's a full capability:

// capman.config.js
module.exports = {
  app: 'my-store',
  baseUrl: 'https://api.my-store.com',
  capabilities: [
    {
      id: 'check_product_availability',
      name: 'Check product availability',
      description: 'Check stock and pricing for a product by name or ID.',
      examples: [
        'Is the blue jacket in stock?',
        'Check availability for product 42',
        'Do you have size M?',
      ],
      params: [
        {
          name: 'product',
          description: 'Product name or ID',
          required: true,
          source: 'user_query',   // extracted from the query
        },
      ],
      returns: ['stock', 'price', 'variants'],
      resolver: {
        type: 'api',
        endpoints: [{ method: 'GET', path: '/products/{product}/availability' }],
      },
      privacy: { level: 'public' },
    },
  ],
}
Enter fullscreen mode Exit fullscreen mode

Then:

npx capman generate   # → manifest.json
npx capman validate   # → check all capabilities
npx capman demo       # → live test with sample queries
Enter fullscreen mode Exit fullscreen mode

What capman Does Not Replace

capman handles the routing layer. It doesn't:

  • Generate creative responses
  • Summarize or reason over content
  • Handle open-ended questions with no predictable shape
  • Replace conversational context management

For any of those, the LLM still runs — capman just makes sure it runs only when it needs to.


Install

npm install capman
Enter fullscreen mode Exit fullscreen mode
npx capman init       # create capman.config.js
npx capman generate   # generate manifest.json
npx capman run "your query" --debug
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/Hobbydefiningdoctory/capman


Summary

If your app has any predictable intents — commands, retrievals, navigations — you are currently paying LLM latency and token cost for work that a keyword scorer can do in 1 ms. capman is the layer that captures that saving without changing your LLM integration.

The LLM becomes a fallback, not a default. That's the shift.


capman v0.4.5 — TypeScript, MIT licence, dual CJS/ESM, zero runtime dependencies beyond zod.

Source: dev.to

arrow_back Back to Tutorials