At 1 a.m., my phone vibrated like crazy — users were complaining that the AI assistant had “amnesia”. It had been told in the morning it was using PostgreSQL 16, but in the afternoon it recommended MySQL 8.0 again. I groggily opened the monitoring dashboard: 99% test coverage, all green. A chill ran down my spine: our so-called “memory tests” were nothing but a Mock charade.
Why Mocks Can’t Uncover Real Memory Problems
The usual thinking goes: an AI agent’s memory is just an API call — memory.add(), memory.search(). Write a test, use unittest.mock.patch to return a pre‑baked list, and assert the agent “called the memory interface”. High traffic, logic flows, coverage looks beautiful.
But in a real production memory layer, scenarios that Mocks can never reproduce keep popping up:
- Memory duplication: the same fact gets stored five times, and the retrieval returns five identical messages. The agent takes them as gospel and starts babbling nonsense.
- Imprecise semantic recall: you add “the user likes lightweight frameworks”, but when you query “backend tech stack” that memory never makes it into the top‑k results.
- Multi‑turn context bleed: user A’s memories leak into user B’s session because
user_idgets swapped — Mock would never check that. - Boundary conditions for updates/deletes: in Mock, a
dict.popis all you need. In reality, is the delete operation idempotent? Will stale content still be retrieved after an update?
The root cause boils down to this: you’re only testing “the contract of the interface”, not “the actual behavior”. Unit tests prove the code is correctly written, but the memory store is a stateful, semantic, distributed component — you simply can’t claim it’s verified without running the real pipeline.
A week ago, I migrated the agent’s memory layer from a home‑grown Redis mashup to Mem0, and at the same time decided to drag the test suite out of Mock hell.
Design: real Mem0 inside pytest, run verification against actual storage
First, the options:
- Continue mocking the Mem0 SDK: just swapping one mock library for another. Nature stays the same — pass.
- Spin up a Mem0 service in CI: deploying a remote service for every test run is too heavy for the development loop.
- Use Mem0’s local in‑memory mode / SQLite backend: starts fast, fully isolated, and enables end‑to‑end verification at the unit‑test level. ✅
Mem0 is designed with pluggable storage backends: in-memory, SQLite, Qdrant, etc. We chose SQLite + a local embedding model (all-MiniLM-L6-v2) — in pytest, a fresh temporary database and embedder instance are created per test and automatically destroyed afterwards, solving test isolation completely.
Why not in‑memory? Because with SQLite, I can directly open the .db file and inspect the underlying table schema, which makes it much easier to debug memory deduplication, metadata storage, and similar issues.
Architecture is minimal: each test case → pytest fixture initializes a MemoryClient → the Agent function calls real memory → the results are asserted semantically or exactly → fixture teardown wipes the database.
Core implementation: one fixture to rule the memory‑test environment
What does this code solve? It gives every test a fully isolated Mem0 context, preventing memory pollution.
# conftest.py
import pytest
import tempfile
import os
from mem0 import Memory
from mem0.embeddings import SentenceTransformerEmbedding
@pytest.fixture
def memory_client():
# 每个测试一个临时数据库文件,跑完就删
tmp_db = tempfile.NamedTemporaryFile(suffix=".db", delete=False)
db_path = tmp_db.name
tmp_db.close()
# 使用本地embedding模型,不调OpenAI,离线可用且速度快
embedder = SentenceTransformerEmbedding(model_name="all-MiniLM-L6-v2")
config = {
"vector_store": {
"provider": "qdrant", # 实际这里用qdrant内存模式更轻,但我们用SQLite举例
# 你也可以换成"chroma"等
"config": {
"path": db_path,
"collection_name": "test_memories"
}
},
"llm": {
"provider": "ollama", # 本地运行embedding用,这里只是配置占位,实际只用embedder
"config": {"model": "llama3"} # 不真正调用LLM
},
"embedder": {
"provider": "sentence_transformers",
"config": {"model": "all-MiniLM-L6-v2"}
}
}
client = Memory.from_config(config_dict=config)
yield client
# 测试结束后彻底清理
client.reset() # 清空所有记忆
os.unlink(db_path) # 删除临时文件
⚠️ Note: Mem0 is iterating fast, some APIs may have changed. I’m using
mem0==0.1.40, with the import pathfrom mem0 import Memory.
This piece of code verifies that “memory add & retrieve” actually work, instead of just checking call counts.
# test_memory_crud.py
import pytest
from my_agent import Agent # 你自己的Agent封装
def test_agent_remembers_user_preference(memory_client):
agent = Agent(memory=memory_client, user_id="user_42")
# Agent记录用户偏好
agent.remember("用户喜欢用Django而不是FastAPI")
# 模拟下一次对话,Agent去回忆
context = agent.recall("Python web框架")
# 语义检索应包含“Django”相关信息,而不是Mock的空列表或假数据
ass