Most AI apps send every user message to a language model — even when the intent is completely predictable. capman intercepts the 80% you already know how to answer, resolves them in under 2 ms at zero token cost, and lets the LLM handle only the queries that genuinely need it.
The Problem
When a user types "what time is it?" into your AI app, this is what happens:
User input
→ serialize message
→ HTTP request to LLM API ← 800–1500 ms
→ token cost (~$0.001–$0.01)
→ parse function call JSON
→ invoke time handler
You paid an LLM to decode "what time is it" into a clock call. It knew the answer in 1 ms. You waited 1,200 ms anyway.
Standard function calling routes every query through the model — by design. For ambiguous, open-ended input, that's the right call. For well-defined intents like "check availability", "go to settings", or "get my orders", it's pure overhead.
What is capman?
capman is a TypeScript library that sits in front of your LLM. It reads a capability manifest — a machine-readable list of everything your app can do — and matches user queries against it using weighted keyword scoring.
If it finds a confident match: it resolves the query directly. No LLM. No network. No tokens.
If it doesn't: it escalates to your LLM automatically.
import { CapmanEngine, readManifest } from 'capman'
const engine = new CapmanEngine({
manifest: readManifest(),
mode: 'balanced', // keyword-first, LLM only when needed
llm: async (prompt) => callYourLLM(prompt),
})
const result = await engine.ask('Check availability for blue jacket')
console.log(result.match.capability?.id) // 'check_product_availability'
console.log(result.resolvedVia) // 'keyword' | 'llm' | 'cache'
console.log(result.trace.totalMs) // 1.6
How It Works
Every engine.ask() call passes through six steps:
1. Cache check — normalized query key lookup. Hit → return in <1 ms.
2. Keyword scorer — every capability in your manifest is scored 0–100 across three weighted sources:
| Source | Weight | Signal |
|---|---|---|
| Example sentences | 60 pts | Strongest — direct intent overlap |
| Capability description | 30 pts | Secondary — synonym coverage |
| Capability name | 10 pts | Weakest — tiebreaker only |
3. Branch decision — confidence ≥ 50%? Direct resolve. Below threshold? LLM escalation (rate-limited + circuit-breaker protected).
4. Privacy enforcement — public, user_owned, or admin checked before any API call fires.
5. Resolver — api (fetch + retry + timeout), nav (URL template), or hybrid (both in parallel).
6. Learning + trace — full ExecutionTrace returned with step-by-step timing. Pre-boost result recorded to prevent feedback loops.
Three Modes, One Decision
cheap → keyword only, <2 ms, $0 — for high-volume known intents
balanced → keyword first, LLM if <50% confidence — default
accurate → LLM first, keyword fallback — for ambiguous open input
The mode is set per-engine instance. Switch by swapping the constructor option.
capman vs Standard Function Calling
The fundamental difference is when the LLM is involved:
| LLM Function Calling | capman | |
|---|---|---|
| Latency | 800–1,500 ms | <2 ms (direct) |
| Token cost | ~$0.001–$0.01/call | $0 on direct match |
| Failure rate | ~25% (hallucinations, timeouts) | 0% on direct match |
| Privacy enforcement | Manual | Built-in per capability |
| Caching | External setup | Memory / File / Combo built-in |
| Tracing | Custom logging needed | Automatic on every call |
capman doesn't replace your LLM — it protects it. The model gets fewer calls, better-framed prompts, and none of the deterministic work it was doing before.
Demo: The Weather Query
The demo runs a battle test: the same query submitted three times in a row, both engines running in parallel. Here's what came back.
The Query
"what is the weather"
Results
Round 1 — cold start:
- LLM:
1,151.17 ms✓ Success → correct tool - capman:
1.61 ms— matched"weather"exactly — 715× faster
Round 2 — learning kicks in (+1 pt boost):
- LLM:
851.51 ms - capman:
0.59 ms— boost applied from prior hit — 1,443× faster
Round 3 — boost grows (+2 pts):
- LLM:
931.02 ms - capman:
0.78 ms— 1,193× faster
Why Does the Latency Drop on Rounds 2 and 3?
The learning store tracks keyword → capability → hitCount. On each successful match, it builds a boost index. On the next call:
boost = min(15, log₂(hitCount + 1) × 2)
More hits → logarithmically larger boost → capability scores higher faster → match resolves sooner in the scorer loop. The cap is +15 points total. The pre-boost result is what gets recorded (not the post-boost result) — this prevents a feedback loop where boosted winners accumulate hits and permanently displace keyword matches.
A Capability Definition
Everything capman does flows from a structured manifest. Here's a full capability:
// capman.config.js
module.exports = {
app: 'my-store',
baseUrl: 'https://api.my-store.com',
capabilities: [
{
id: 'check_product_availability',
name: 'Check product availability',
description: 'Check stock and pricing for a product by name or ID.',
examples: [
'Is the blue jacket in stock?',
'Check availability for product 42',
'Do you have size M?',
],
params: [
{
name: 'product',
description: 'Product name or ID',
required: true,
source: 'user_query', // extracted from the query
},
],
returns: ['stock', 'price', 'variants'],
resolver: {
type: 'api',
endpoints: [{ method: 'GET', path: '/products/{product}/availability' }],
},
privacy: { level: 'public' },
},
],
}
Then:
npx capman generate # → manifest.json
npx capman validate # → check all capabilities
npx capman demo # → live test with sample queries
What capman Does Not Replace
capman handles the routing layer. It doesn't:
- Generate creative responses
- Summarize or reason over content
- Handle open-ended questions with no predictable shape
- Replace conversational context management
For any of those, the LLM still runs — capman just makes sure it runs only when it needs to.
Install
npm install capman
npx capman init # create capman.config.js
npx capman generate # generate manifest.json
npx capman run "your query" --debug
GitHub: github.com/Hobbydefiningdoctory/capman
Summary
If your app has any predictable intents — commands, retrievals, navigations — you are currently paying LLM latency and token cost for work that a keyword scorer can do in 1 ms. capman is the layer that captures that saving without changing your LLM integration.
The LLM becomes a fallback, not a default. That's the shift.
capman v0.4.5 — TypeScript, MIT licence, dual CJS/ESM, zero runtime dependencies beyond zod.