I got tired of re-explaining my codebase to every coding agent — so I made critical memory live in the repo next to code

I switch coding agents constantly.

Opus is ahead one month, some GPT the next, Gemini gets better, Cursor changes, local setups become good enough for some tasks, on top of that, usage limits and token policies keep forcing me to jump tools mid-project just to keep shipping.

Every switch has the same annoying failure mode:

the new agent is looking at the same repo, but it has lost the project's working memory.

It re-scans the tree to orient itself.

It re-asks things we already settled.

It misses local conventions.

It repeats footguns we already hit.

And I paste the same thing into the prompt for the Nth time:

here's the architecture, we decided X, don't do Y

All the context I had built up lived inside one tool's session and died the moment I moved.

So I built a fix for myself, then cleaned it up and open-sourced it.

The project is called agent-memory.

The problem

Every serious repo has context that is not obvious from the files alone:

why a module is shaped a certain way
which convention is local to this project
which bug we already hit
what this branch is currently trying to finish
which approach we explicitly decided not to use
which API looks tempting but should not be used
which test command is the one that actually matters

Some of that context belongs in docs. Some belongs in code comments. Some belongs in AGENTS.md, CLAUDE.md, or .cursor/rules.

But a lot of it is living project memory.

It changes as the branch evolves.

It gets discovered while working.

It is useful to the next agent, but not always worth turning into permanent documentation.

And if it only lives in a chat session, it is basically disposable.

The idea: boring on purpose

Keep the project's living context: current task state, decisions, conventions, known footguns as plain Markdown committed to the repo, and expose it to the agent over MCP.

That's it.

The agent pulls relevant context in one MCP call instead of re-reading the tree every session, and I stop hand-feeding decisions into the prompt because they already live in memory and surface when relevant.

The short version:

memory lives next to the code
Markdown is the source of truth
agents fetch relevant context over MCP
durable updates stage for human review
the local index is rebuildable
no cloud memory layer
no vendor lock-in
no opaque database as the only source of truth

The goal is simple:

stop re-teaching the same repo to every coding agent.

What the agent gets

Agents access memory through three MCP tools:

Tool	Purpose
`memory.fetch_context`	Fetch relevant project context for the current task
`memory.propose_update`	Propose a durable memory update for human review
`memory.status`	Inspect memory and index status

A typical workflow looks like this:

The agent starts a task.
It calls memory.fetch_context.
It gets a compact pack of relevant decisions, conventions, pitfalls, and module notes.
During the task, it discovers something durable.
It calls memory.propose_update.
The update is staged.
A human reviews the diff and applies or rejects it.

The agent can propose.

The human decides.

That boundary matters.

Try it

agent-memory is a small Go tool: CGo-free, Apache-2.0 licensed, and designed to run locally as an MCP server.

Add it to .mcp.json:

{"mcpServers":{"agent-memory":{"command":"npx","args":["-y","@xchucx/agent-memory","mcp","--root","."]}}}

Or run it directly:

npx -y @xchucx/agent-memory mcp --root .

Initialize memory in a repo:

npx -y @xchucx/agent-memory init

After that, your coding agent can ask the repo for memory instead of starting cold every time.

The payoff: portability

The non-obvious payoff is portability.

Because the memory is files in my repo — not a vendor's cloud, not one IDE's private state, not one chat session — Claude Code, Cursor, Gemini, and MCP-capable local agents can all read the same project memory.

Switching tools no longer resets the project context.

The context travels with the code.

That matters more as agent workflows become less tied to one model or one IDE.

I do not want my repo's working memory to depend on whichever agent happens to be best this month.

Why not just AGENTS.md or CLAUDE.md?

I use those too.

But for me, files like AGENTS.md, CLAUDE.md, and .cursor/rules work best as static instructions:

how to run tests
coding style
repo rules
preferred libraries
commands the agent should use
commands the agent should avoid
"always do X"
"never do Y"

They tell the agent how to behave.

What I was missing was living project memory:

what this branch is currently trying to finish
why we chose one implementation over another
which module owns a particular flow
which bug or footgun we discovered yesterday
which convention exists but is not obvious from one file
what the next agent should know before touching this part of the repo

That kind of information changes more often.

It can become stale.

It needs review.

It needs to be searchable.

And it should not turn static instruction files into a junk drawer of rules, TODOs, old session notes, warnings, decisions, and random reminders.

So I see them as different layers:

Layer	Best for	Lifecycle
`AGENTS.md` / `CLAUDE.md` / `.cursor/rules`	Stable instructions and repo rules	Edited rarely, read as policy
Vendor or IDE memory	Personal preferences and tool-specific state	Useful, but usually trapped in one environment
`agent-memory`	Decisions, conventions, pitfalls, task state, module notes	Changes over time, reviewed like project context

The goal is not to replace instruction files.

The goal is to stop overloading them.

Static instructions and living project memory deserve different lifecycles.

Three decisions that shaped it

1. Markdown in your repo, not a database

I wanted the memory to be boring and inspectable.

So the source of truth is plain Markdown inside the repo.

I can open it. Read it. Edit it. git diff it.

There is a SQLite index, but it is only a local shadow index for retrieval. It is fully regenerable from the Markdown files.

The important part is that the memory itself is not trapped inside an opaque database.

It lives next to the code.

2. Human-in-the-loop writes

I did not want the agent silently rewriting shared project memory.

That felt wrong.

If a coding agent learns something durable: a decision, a convention, a module fact, a recurring footgun — it should be able to propose an update.

But that update should not automatically become shared truth.

So durable changes stage for review:

agent-memory review --diff
agent-memory apply

The agent proposes.

I approve.

That keeps memory useful without turning it into a pile of unreviewed agent guesses.

Shared memory should be treated like shared project knowledge.

3. No vector databases or embeddings

When building AI memory tools, the industry reflex is to reach for vector databases and embedding models.

I get why.

Embeddings are powerful, and for large fuzzy knowledge bases they can be the right tool.

But for this project I wanted something simpler.

Project memory is not the whole internet. It is usually dozens or hundreds of small, human-written sections:

decisions
conventions
pitfalls
module notes
task state
local project facts

For that scale, a vector database felt like the wrong default.

Embeddings add another moving part: API keys, model choice, regeneration, drift, or heavier local setup.

More importantly, you cannot easily git commit a vector database and share it with your team as project knowledge.

So I skipped all of that.

agent-memory uses a local SQLite shadow index with standard full-text search via FTS5/BM25.

The source of truth is still Markdown.

The index is entirely regenerable from the Markdown files.

That gives fast, budgeted retrieval with zero external services:

no embedding API
no vector database
no cloud dependency
no model-specific memory format
works offline
works with local agents
works on airplanes

In the current benchmark, retrieval gets the right section into the top 5 for 98% of labeled queries.

That is recall@5 0.98.

Is FTS5/BM25 the fanciest possible retrieval method?

No.

But it is boring, inspectable, portable, and good enough for repo-scale project memory.

That tradeoff is exactly the point.

What belongs in memory?

Not everything.

agent-memory works best for compact, durable project knowledge.

Good candidates:

architecture decisions
project conventions
known pitfalls
module ownership notes
current branch or task state
integration quirks
test or build gotchas
things that future agents should not rediscover

Bad candidates:

huge logs
temporary scratchpad noise
secrets
raw chat transcripts
anything you would not want in the repo

My rule of thumb:

if a memory entry would make no sense in code review, it probably does not belong there.

The hard part: does it actually work?

I did not want to ship a vanity number.

So I tried to measure it.

This turned into the most interesting part of the project.

There are three different questions:

Does retrieval return the right memory sections?
Does a lesson recorded in one session survive into the next?
Does the agent actually behave differently because of memory?

The first two are relatively easy to measure honestly.

The third one is where things get messy.

Retrieval

Retrieval is deterministic.

On a labeled benchmark, the right section lands in the top 5 for 98% of queries.

That is recall@5 0.98.

It runs in CI.

That does not prove the whole product works, but it does prove that the local retrieval layer is not just vibes.

Continuity

Continuity asks a different question:

does a lesson recorded in session 1 survive into session 2?

Through the real record → persist → retrieve loop:

5/5 scenarios with memory
0/5 without memory

Also deterministic.

This is the core thing I wanted: the next agent should not rediscover what the previous agent already learned.

Behavioural impact

The hardest question is not retrieval.

It is behavioural impact:

does the agent actually make better decisions because memory exists?

My early usage and test runs are positive: agents do reuse remembered decisions, avoid repeated footguns, and recover project context faster when memory is available.

But I do not want to publish a clean-looking number until I can stand behind the benchmark.

A proper behavioural benchmark is surprisingly hard to isolate:

modern coding agents already use many sources of context
some tools have their own cross-session memory
user-scoped MCP servers can leak into supposedly isolated runs
model behaviour varies across runs
"success" is harder to score than retrieval

So for now, I treat behavioural results as promising but not fully benchmarked.

What this is not

agent-memory is not a replacement for reading the code.

The agent still needs to inspect files, run tests, and understand the actual implementation.

It is not a vector database for everything you have ever done.

It is not a raw transcript store.

It is not meant to let agents silently rewrite project truth.

The point is narrower:

keep durable project context close to the repo, make it retrievable by agents, and make changes reviewable by humans.

That is it.

Boring on purpose.

Try it on a real repo

The project is here:

github.com/xChuCx/agent-memory

The best test is not a toy demo.

The best test is an annoying real project where you keep repeating the same context to agents again and again.

Try it there.

I'm especially interested in feedback from people using multiple coding agents across the same codebase:

does the memory structure fit your workflow?
are the MCP tools the right primitives?
what should be easier in the review/apply flow?
which clients should be supported better?
what kind of memory would you trust agents to propose?

Try agent-memory on GitHub

If you want portable, reviewable project memory to become a normal layer for coding agents, a star helps other people find the project.