Experience Engine: AI Memory That Shrinks As Your Agent Learns

Every AI coding session, my agent made the same mistakes.

DbContext as singleton — state corruption, 15 minutes debugging. Again. ILogger instead of IMLog — lost tenant context. Again. Wrong project reference path — build fails. Again.

I had 500 memory notes. My agent was still a junior with a bigger notebook.

So I built something different.

The Problem With AI Memory

Every AI memory tool — Mem0, Letta, Zep — stores facts. More sessions = more entries = more tokens = more cost. They're giving your agent a bigger notebook.

But here's the thing: a notebook doesn't make you experienced.

A junior developer with 500 notes is still a junior. A mid-level developer with 15 principles understands why things work. The difference isn't how much you remember — it's whether you can generalize.

Junior (500 notes):
  "DbContext singleton caused bug"
  "HttpClient singleton caused leak"  
  "SmtpClient singleton caused corruption"
  → Encounters RedisConnection singleton → NO MATCH → makes the mistake

Mid-level (1 principle):
  "Stateful objects must be scoped, never singleton"
  → Encounters RedisConnection singleton → MATCHES → avoids the mistake

That's what Experience Engine does. It doesn't store more facts. It evolves facts into principles, then deletes the facts.

How It Works

When you code with any AI agent (Claude Code, Gemini CLI, Codex CLI), Experience Engine runs silently in the background:

Before every Edit/Write/Bash:
A hook queries the experience store: "Have I seen this mistake before?" If yes, it injects a warning directly into the agent's context:

⚠️ [Experience - High Confidence (0.85)]: Stateful objects must be 
scoped, never singleton. Last time SingleInstance caused state 
corruption in DbContext.

The agent reads this warning and avoids the mistake. No human intervention needed.

After every session:
An extractor scans the session transcript for mistake patterns:

Retry loops (same tool call 3+ times)
User corrections ("no, not that", "wrong", "undo")
Test fail → fix cycles
Git reverts

Each detected mistake gets extracted into a structured Q&A entry and stored in a vector database.

Weekly (automatic):
The evolution engine runs:

Promote: entries confirmed 3+ times move from cache → behavioral rules
Abstract: clusters of 3+ similar entries → one general principle
Demote: entries ignored 3+ times get deprioritized
Archive: entries unused for 90 days get cleaned up

The result: memory shrinks as capability grows.

The 4-Tier Architecture

T0 Principles  (~400 tokens)  — generalized rules, always loaded
    "Stateful objects must be scoped, never singleton"

T1 Behavioral  (~600 tokens)  — specific reflexes, always loaded
    "WHEN DbContext + DI → MUST check lifetime FIRST"

T2 QA Cache    (semantic)     — detailed Q&A, retrieved on match
    Q: "Why not singleton?" → A: "State corruption across requests"

T3 Raw         (staging)      — unprocessed mistakes, TTL 30 days

Lifecycle: T3 → extract → T2 → promote (3x confirmed) → T1 
           → generalize (cluster 3+) → T0
           Memory SHRINKS as capability GROWS

What Makes This Different

Mem0	Letta	Zep	Experience Engine
Over time	Entries grow forever	Entries grow forever	Entries grow forever	Entries shrink into principles
Novel cases	Only exact matches	Only exact matches	Only exact matches	Principles generalize
Mistake learning	No	No	No	5 detection patterns
Dependencies	Python + SDK	PostgreSQL	PostgreSQL	Zero (Node.js built-in)
Local-first	Optional	Optional	Partial	Default
Data ownership	Cloud vendor	SaaS terms	Cloud vendor	You own everything

Experience Graph

Experiences aren't isolated entries — they're linked with typed edges:

DbContext singleton ──generalizes──→ "Stateful objects: always scoped"
                    ──relates-to───→ HttpClient singleton  
                    ──supersedes───→ [old] "Use transient for DbContext"
                    ──contradicts──→ [demoted] "Singleton is fine for DbContext"

When one experience matches your current action, the engine follows edges to surface related experiences too. This is how it catches RedisConnection singleton — not because it's seen Redis before, but because it's connected to the principle about stateful objects.

Temporal Reasoning

Knowledge evolves. What was true in January might be wrong in March:

January:  "Use singleton for HttpClient" (confirmed 5x)
March:    "Use IHttpClientFactory instead" (contradicts January)
          → January entry superseded, not deleted
          → March entry ranks higher (recent confirmation)
          → GET /api/timeline?topic=httpclient shows the evolution

The engine tracks confirmedAt[] arrays — not just "what was learned" but "when it was last confirmed." Stale knowledge gets penalized. Recent confirmations get boosted.

REST API

Everything is accessible via HTTP — not just CLI hooks:

# Start the server
node server.js
# Experience Engine API running on http://localhost:8082

# Check health
curl localhost:8082/health

# Query experience before a tool call
curl -X POST localhost:8082/api/intercept \
  -H "Content-Type: application/json" \
  -d '{"toolName":"Write","toolInput":{"file_path":"src/Startup.cs"}}'

# Response:
{
  "suggestions": "⚠️ [High Confidence (0.85)]: Stateful objects must be scoped",
  "hasSuggestions": true
}

# Trigger evolution
curl -X POST localhost:8082/api/evolve
# {"promoted":2,"abstracted":1,"demoted":0,"archived":3,"success":true}

# View stats
curl localhost:8082/api/stats?since=30d

# Knowledge timeline
curl "localhost:8082/api/timeline?topic=dependency+injection"

# Experience graph
curl "localhost:8082/api/graph?id=abc-123"

10 endpoints total. Zero dependencies — uses Node.js built-in http module. CORS enabled for browser extensions.

Python SDK

from muonroi_experience import Client

client = Client("http://localhost:8082")

# Query experience
result = client.intercept("Write", {"file_path": "app.py"})
if result["hasSuggestions"]:
    print(result["suggestions"])

# Extract lessons
client.extract("Agent tried singleton for DbContext, caused corruption...")

# Trigger evolution  
evolution = client.evolve()
print(f"Promoted: {evolution['promoted']}, Abstracted: {evolution['abstracted']}")

# Check stats
stats = client.stats(since="7d")
print(f"Mistakes avoided: {stats['suggestions']}")

# View knowledge timeline
timeline = client.timeline("dependency injection")
for entry in timeline["timeline"]:
    status = "[superseded]" if entry["superseded"] else ""
    print(f"{status}{entry['solution']}")

Zero dependencies — uses Python stdlib urllib. Python 3.8+.

Multi-User Support

Multiple developers on the same machine get isolated stores:

EXP_USER=alice node server.js    # Alice's experiences
EXP_USER=bob node server.js      # Bob's (completely isolated)

Share principles without sharing personal data:

# Alice shares a principle she evolved
curl -X POST localhost:8082/api/principles/share \
  -d '{"principleId": "abc-123"}'
# Returns portable JSON — no personal data included

# Bob imports it
curl -X POST localhost:8082/api/principles/import \
  -d '{"principle":"Stateful objects must be scoped","solution":"...","confidence":0.85}'
# Bob's evolution engine manages it independently from here

Quick Start (5 minutes)

git clone https://github.com/muonroi/experience-engine.git
cd experience-engine
bash .experience/setup.sh --local   # Docker Qdrant + Ollama (100% free)

The setup wizard handles everything. After setup, your agent starts learning automatically through hooks.

Supported providers

You're not locked to Ollama. The engine supports:

Embedding: Ollama, OpenAI, Gemini, VoyageAI, SiliconFlow, or any OpenAI-compatible API

Brain (extraction): Ollama, OpenAI, Gemini, Claude, DeepSeek, SiliconFlow, or any OpenAI-compatible API

Mix and match — e.g., SiliconFlow for cheap embeddings + Ollama for free extraction.

Anti-Noise Scoring

Not all experiences are equal. Results are ranked by:

Hit frequency — confirmed experiences rank higher
Recency — recently confirmed > stale (60+ days penalty)
Confidence aging — new entries start lower, climb with confirmation
Ignore tracking — suggestions ignored 3x get demoted
Domain match — editing .ts file → TypeScript experiences rank higher
Temporal boost — confirmed in last 7 days → score boost
Superseded penalty — replaced knowledge ranks lower

This means your agent gets the most relevant, most trusted experience for the current context — not just the most similar vector match.

The Philosophy

Every AI memory company stores your data on their cloud and charges you to access it. Mem0 stores your memories. Letta stores your agent state. You pay monthly to access your own knowledge.

Experience Engine is different:

Your data never leaves your machine (unless you choose cloud sync)
Zero vendor lock-in — standard formats, portable profiles
Zero dependencies — Node.js built-in modules only
The engine is open source — you pay for convenience, never for capability

"Enterprise AI replaces you. Personal AI empowers you. Same technology. Different owner."

What's Next

The engine is live and working. I'm dogfooding it on my own projects right now. After 2 weeks:

47 suggestions fired
12 mistakes avoided
5 principles evolved from ~50 raw entries
Memory footprint decreased (entries compressed into principles)

Next up: dashboard for visualizing the "Saves" feed (mistakes the agent didn't make), and a browser extension that injects experience into ChatGPT/Claude/Gemini web interfaces.