I got tired of re-explaining my codebase to every coding agent — so I made critical memory live in the repo next to code

go dev.to

I switch coding agents constantly.

Opus is ahead one month, some GPT the next, Gemini gets better, Cursor changes, local setups become good enough for some tasks, on top of that, usage limits and token policies keep forcing me to jump tools mid-project just to keep shipping.

Every switch has the same annoying failure mode:

the new agent is looking at the same repo, but it has lost the project's working memory.

  • It re-scans the tree to orient itself.
  • It re-asks things we already settled.
  • It misses local conventions.
  • It repeats footguns we already hit.

And I paste the same thing into the prompt for the Nth time:

here's the architecture, we decided X, don't do Y

All the context I had built up lived inside one tool's session and died the moment I moved.

So I built a fix for myself, then cleaned it up and open-sourced it.

The project is called agent-memory.

The problem

Every serious repo has context that is not obvious from the files alone:

  • why a module is shaped a certain way
  • which convention is local to this project
  • which bug we already hit
  • what this branch is currently trying to finish
  • which approach we explicitly decided not to use
  • which API looks tempting but should not be used
  • which test command is the one that actually matters

Some of that context belongs in docs. Some belongs in code comments. Some belongs in AGENTS.md, CLAUDE.md, or .cursor/rules.

But a lot of it is living project memory.

It changes as the branch evolves.

It gets discovered while working.

It is useful to the next agent, but not always worth turning into permanent documentation.

And if it only lives in a chat session, it is basically disposable.

The idea: boring on purpose

Keep the project's living context: current task state, decisions, conventions, known footguns as plain Markdown committed to the repo, and expose it to the agent over MCP.

That's it.

The agent pulls relevant context in one MCP call instead of re-reading the tree every session, and I stop hand-feeding decisions into the prompt because they already live in memory and surface when relevant.

The short version:

  • memory lives next to the code
  • Markdown is the source of truth
  • agents fetch relevant context over MCP
  • durable updates stage for human review
  • the local index is rebuildable
  • no cloud memory layer
  • no vendor lock-in
  • no opaque database as the only source of truth

The goal is simple:

stop re-teaching the same repo to every coding agent.

What the agent gets

Agents access memory through three MCP tools:

Tool Purpose
memory.fetch_context Fetch relevant project context for the current task
memory.propose_update Propose a durable memory update for human review
memory.status Inspect memory and index status

A typical workflow looks like this:

  1. The agent starts a task.
  2. It calls memory.fetch_context.
  3. It gets a compact pack of relevant decisions, conventions, pitfalls, and module notes.
  4. During the task, it discovers something durable.
  5. It calls memory.propose_update.
  6. The update is staged.
  7. A human reviews the diff and applies or rejects it.

The agent can propose.

The human decides.

That boundary matters.

Try it

agent-memory is a small Go tool: CGo-free, Apache-2.0 licensed, and designed to run locally as an MCP server.

Add it to .mcp.json:

{"mcpServers":{"agent-memory":{"command":"npx","args":["-y","@xchucx/agent-memory","mcp","--root","."]}}}
Enter fullscreen mode Exit fullscreen mode

Or run it directly:

npx -y @xchucx/agent-memory mcp --root .
Enter fullscreen mode Exit fullscreen mode

Initialize memory in a repo:

npx -y @xchucx/agent-memory init
Enter fullscreen mode Exit fullscreen mode

After that, your coding agent can ask the repo for memory instead of starting cold every time.

The payoff: portability

The non-obvious payoff is portability.

Because the memory is files in my repo — not a vendor's cloud, not one IDE's private state, not one chat session — Claude Code, Cursor, Gemini, and MCP-capable local agents can all read the same project memory.

Switching tools no longer resets the project context.

The context travels with the code.

That matters more as agent workflows become less tied to one model or one IDE.

I do not want my repo's working memory to depend on whichever agent happens to be best this month.

Why not just AGENTS.md or CLAUDE.md?

I use those too.

But for me, files like AGENTS.md, CLAUDE.md, and .cursor/rules work best as static instructions:

  • how to run tests
  • coding style
  • repo rules
  • preferred libraries
  • commands the agent should use
  • commands the agent should avoid
  • "always do X"
  • "never do Y"

They tell the agent how to behave.

What I was missing was living project memory:

  • what this branch is currently trying to finish
  • why we chose one implementation over another
  • which module owns a particular flow
  • which bug or footgun we discovered yesterday
  • which convention exists but is not obvious from one file
  • what the next agent should know before touching this part of the repo

That kind of information changes more often.

It can become stale.

It needs review.

It needs to be searchable.

And it should not turn static instruction files into a junk drawer of rules, TODOs, old session notes, warnings, decisions, and random reminders.

So I see them as different layers:

Layer Best for Lifecycle
AGENTS.md / CLAUDE.md / .cursor/rules Stable instructions and repo rules Edited rarely, read as policy
Vendor or IDE memory Personal preferences and tool-specific state Useful, but usually trapped in one environment
agent-memory Decisions, conventions, pitfalls, task state, module notes Changes over time, reviewed like project context

The goal is not to replace instruction files.

The goal is to stop overloading them.

Static instructions and living project memory deserve different lifecycles.

Three decisions that shaped it

1. Markdown in your repo, not a database

I wanted the memory to be boring and inspectable.

So the source of truth is plain Markdown inside the repo.

I can open it. Read it. Edit it. git diff it.

There is a SQLite index, but it is only a local shadow index for retrieval. It is fully regenerable from the Markdown files.

The important part is that the memory itself is not trapped inside an opaque database.

It lives next to the code.

2. Human-in-the-loop writes

I did not want the agent silently rewriting shared project memory.

That felt wrong.

If a coding agent learns something durable: a decision, a convention, a module fact, a recurring footgun — it should be able to propose an update.

But that update should not automatically become shared truth.

So durable changes stage for review:

agent-memory review --diff
agent-memory apply
Enter fullscreen mode Exit fullscreen mode

The agent proposes.

I approve.

That keeps memory useful without turning it into a pile of unreviewed agent guesses.

Shared memory should be treated like shared project knowledge.

3. No vector databases or embeddings

When building AI memory tools, the industry reflex is to reach for vector databases and embedding models.

I get why.

Embeddings are powerful, and for large fuzzy knowledge bases they can be the right tool.

But for this project I wanted something simpler.

Project memory is not the whole internet. It is usually dozens or hundreds of small, human-written sections:

  • decisions
  • conventions
  • pitfalls
  • module notes
  • task state
  • local project facts

For that scale, a vector database felt like the wrong default.

Embeddings add another moving part: API keys, model choice, regeneration, drift, or heavier local setup.

More importantly, you cannot easily git commit a vector database and share it with your team as project knowledge.

So I skipped all of that.

agent-memory uses a local SQLite shadow index with standard full-text search via FTS5/BM25.

The source of truth is still Markdown.

The index is entirely regenerable from the Markdown files.

That gives fast, budgeted retrieval with zero external services:

  • no embedding API
  • no vector database
  • no cloud dependency
  • no model-specific memory format
  • works offline
  • works with local agents
  • works on airplanes

In the current benchmark, retrieval gets the right section into the top 5 for 98% of labeled queries.

That is recall@5 0.98.

Is FTS5/BM25 the fanciest possible retrieval method?

No.

But it is boring, inspectable, portable, and good enough for repo-scale project memory.

That tradeoff is exactly the point.

What belongs in memory?

Not everything.

agent-memory works best for compact, durable project knowledge.

Good candidates:

  • architecture decisions
  • project conventions
  • known pitfalls
  • module ownership notes
  • current branch or task state
  • integration quirks
  • test or build gotchas
  • things that future agents should not rediscover

Bad candidates:

  • huge logs
  • temporary scratchpad noise
  • secrets
  • raw chat transcripts
  • anything you would not want in the repo

My rule of thumb:

if a memory entry would make no sense in code review, it probably does not belong there.

The hard part: does it actually work?

I did not want to ship a vanity number.

So I tried to measure it.

This turned into the most interesting part of the project.

There are three different questions:

  1. Does retrieval return the right memory sections?
  2. Does a lesson recorded in one session survive into the next?
  3. Does the agent actually behave differently because of memory?

The first two are relatively easy to measure honestly.

The third one is where things get messy.

Retrieval

Retrieval is deterministic.

On a labeled benchmark, the right section lands in the top 5 for 98% of queries.

That is recall@5 0.98.

It runs in CI.

That does not prove the whole product works, but it does prove that the local retrieval layer is not just vibes.

Continuity

Continuity asks a different question:

does a lesson recorded in session 1 survive into session 2?

Through the real record → persist → retrieve loop:

  • 5/5 scenarios with memory
  • 0/5 without memory

Also deterministic.

This is the core thing I wanted: the next agent should not rediscover what the previous agent already learned.

Behavioural impact

The hardest question is not retrieval.

It is behavioural impact:

does the agent actually make better decisions because memory exists?

My early usage and test runs are positive: agents do reuse remembered decisions, avoid repeated footguns, and recover project context faster when memory is available.

But I do not want to publish a clean-looking number until I can stand behind the benchmark.

A proper behavioural benchmark is surprisingly hard to isolate:

  • modern coding agents already use many sources of context
  • some tools have their own cross-session memory
  • user-scoped MCP servers can leak into supposedly isolated runs
  • model behaviour varies across runs
  • "success" is harder to score than retrieval

So for now, I treat behavioural results as promising but not fully benchmarked.

What this is not

agent-memory is not a replacement for reading the code.

The agent still needs to inspect files, run tests, and understand the actual implementation.

It is not a vector database for everything you have ever done.

It is not a raw transcript store.

It is not meant to let agents silently rewrite project truth.

The point is narrower:

keep durable project context close to the repo, make it retrievable by agents, and make changes reviewable by humans.

That is it.

Boring on purpose.

Try it on a real repo

The project is here:

github.com/xChuCx/agent-memory

The best test is not a toy demo.

The best test is an annoying real project where you keep repeating the same context to agents again and again.

Try it there.

I'm especially interested in feedback from people using multiple coding agents across the same codebase:

  • does the memory structure fit your workflow?
  • are the MCP tools the right primitives?
  • what should be easier in the review/apply flow?
  • which clients should be supported better?
  • what kind of memory would you trust agents to propose?

Try agent-memory on GitHub

If you want portable, reviewable project memory to become a normal layer for coding agents, a star helps other people find the project.

Source: dev.to

arrow_back Back to Tutorials