Vibe-Memory Part 4: Three Months Building an AI Semantic Memory — What I Learned Building This Side Project From Scratch

Honestly, I can't believe it's already been three months since I started building Vibe-Memory. If you've been following along, you know this project started from a simple frustration: every time I talk to ChatGPT, it forgets who I am, what I've been working on, and all the little things that make our conversations actually meaningful.

Three months, 73 commits, and three Dev.to articles later — it's working. But not without a lot of surprises along the way. So here's what I learned building a complete AI semantic memory system from scratch, what I'd do differently if I started over, and where this project goes from here.

The Quick Recap (For New Readers)

If you're just joining us, Vibe-Memory is a simple service that adds semantic memory to AI chatbots. It solves the classic "ChatGPT amnesia" problem:

Every time you have a conversation, Vibe-Memory stores the important bits
When you start a new conversation, it automatically searches for relevant memories
It injects those memories into your prompt so the AI remembers context
All your data stays in your own database — no sending everything to OpenAI

We've already covered:

Part 1: The basic architecture and why I built it (this post)
Part 2: Comparing 8 different embedding models to find the best one for personal use
Part 3: Optimizing pgvector for 10x faster queries with 5 simple tricks

Today, I want to zoom out and share the big-picture lessons I didn't expect to learn.

The Project by the Numbers

Let me start with some hard data because I love looking back at what actually happened versus what I planned:

Metric	Number
Total Time	3 months
Commits	73
Lines of Code	~1,200 (Go + PostgreSQL)
Total Cost So Far	$2.17
Stored Memories	1,247
Average Query Time	42ms
GitHub Stars	12 (thank you! 🎉)

Wait — $2.17 total? Yeah. That's not a typo. Embedding 1,247 memories with OpenAI text-embedding-3-small costs $0.00002 per embedding. That's 2.5 cents per 1,000 memories. My PostgreSQL hosting is $0 a month on Fly.io's free tier. This project is basically free to run for personal use.

I didn't expect that when I started. I thought I'd need a fancy vector database subscription or something. Turns out PostgreSQL + pgvector is more than enough for personal use, and it's basically free if you're not dealing with millions of vectors.

What Surprised Me Most

So here's the thing — I went into this project thinking the hard part would be the AI/vector search stuff. I was wrong. The hard parts were things I completely didn't expect.

1. The Hard Part Isn't the AI — It's Deciding What to Remember

Honestly, this hit me like a truck. If you just save everything the user says, your search results get cluttered with junk. If you save too little, you miss important context. Getting this balance right is trickier than I thought.

My first approach was: save every message. That was terrible. Search results would pull in random offhand comments from three months ago that had nothing to do with the current conversation.

My current approach: I let the AI decide what's worth remembering. After each message, I ask it: "Extract the important facts, preferences, and events from this message that would be useful to remember in future conversations."

Here's the actual prompt I use (you can steal this):

const extractionPrompt = `Extract key information from the conversation message that would be useful for future AI to remember.

Focus on:
- Facts about the user (preferences, skills, background)
- Current projects and their status
- Decisions made and conclusions reached
- Important events or experiences
- Questions the user asked that might be relevant later

Ignore:
- Casual chit-chat that doesn't need remembering
- One-time instructions that don't affect future conversations
- Sensitive information (don't extract passwords, API keys, etc.)

Return as a JSON array of strings, each string is one concise point.
Example output: ["User prefers Go over Python for backend development", "Working on a semantic memory project called Vibe-Memory"]`

This works way better than saving everything. But it's not perfect. Sometimes it extracts things that don't really need to be remembered, and sometimes it misses subtle things that actually would be useful.

The lesson: AI memory isn't just about search — it's about curation. Garbage in, garbage out still applies, even with fancy embeddings.

2. Embedding Quality Matters More Than I Thought, But Not In The Way You Think

In Part 2, we compared 8 different embedding models. I landed on text-embedding-3-small 512dim as the sweet spot for most people. But here's what I didn't expect: even with "good enough" embeddings, you still get surprising false positives.

For example, I was talking the other day about my motorcycle trip last weekend. The search pulled up a conversation from six months ago where I mentioned "motorcycle maintenance" — which is technically related, but not actually relevant to what I was talking about now.

Does this break everything? No, because the AI is pretty good at ignoring irrelevant context. But it does waste tokens and sometimes leads the AI astray.

The fix I'm experimenting with now: After getting the initial search results, I have the AI re-rank them to pick the truly relevant ones. It adds a tiny bit of latency (one extra LLM call), but improves quality noticeably.

Here's the code sketch in Go:

type MemoryReRanker struct {
    client openai.Client
}

func (r *MemoryReRanker) RankRelevant(query string, candidates []Memory, topN int) ([]Memory, error) {
    prompt := fmt.Sprintf(`Given the current user query:
"%s"

And these candidate memory chunks:
%v

Select the top %d candidates that are actually relevant to the current query.
Return your answer as a JSON array of integers (the indices, starting from 0).
`, query, formatCandidates(candidates), topN)

    var result []int
    err := r.client.CreateChatCompletion(... get JSON result ...)
    if err != nil {
        return candidates[:min(topN, len(candidates))], err
    }

    var ranked []Memory
    for _, idx := range result {
        if idx >= 0 && idx < len(candidates) {
            ranked = append(ranked, candidates[idx])
        }
    }
    return ranked, nil
}

Is this overkill? Maybe. But the quality improvement is noticeable enough that I think it's worth it for personal use. The extra cost is like a penny a week. Worth it.

3. Privacy Actually Changes the Architecture

I built this project because I wanted my memory to stay private. All embeddings and memories live in my own PostgreSQL database, only the current query's relevant memories get sent to OpenAI. That was the plan from day one.

But what I didn't expect: privacy changes how you think about everything. When all your data is local, you don't need to worry about:

OpenAI training on your private notes
Third parties getting access to your personal conversations
API outages taking away your memories (you still have your PostgreSQL dump)
Monthly subscription fees for a vector database

The downside? You have to host it yourself. But honestly, for a personal project, that's really not that hard these days. Fly.io, Render, even a cheap VPS works. And it's basically free for low usage.

The unexpected benefit: Because everything is local, I can actually do things that would be creepy if it was a third-party service. Like, saving every conversation automatically doesn't feel creepy when you control the database.

4. Simpler Beats Complexer — I Can't Stress This Enough

When I started, I was tempted to go all-in: use Pinecone, add a fancy frontend, build a whole ecosystem, support multiple databases, yada yada.

I'm glad I didn't. The current version is:

~800 lines of Go code for the backend API
~200 lines for the core memory logic
PostgreSQL + pgvector for storage and search
That's it. No fancy frontend (I use it through the MCP server now)
One docker-compose file to run the whole thing

Every time I thought about adding something complex, I asked myself: "Do I actually need this today?" 9 times out of 10, the answer was no.

Look at this core search function. It's honestly nothing fancy:

func (s *MemoryStore) SearchSimilar(ctx context.Context, embedding []float32, limit int) ([]MemorySearchResult, error) {
    query := `
        SELECT id, content, created_at, embedding <-> $1 AS distance
        FROM memories
        ORDER BY distance
        LIMIT $2
    `

    rows, err := s.db.Query(ctx, query, embedding, limit)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var results []MemorySearchResult
    for rows.Next() {
        var res MemorySearchResult
        err := rows.Scan(&res.ID, &res.Content, &res.CreatedAt, &res.Distance)
        if err != nil {
            continue
        }
        results = append(results, res)
    }
    return results, nil
}

That's the whole search function. 20 lines of code. Postgres + pgvector does all the heavy lifting. Why would I complicate this?

The lesson: For side projects especially, simple scales better than complex. You can always add complexity later if you actually need it. Most of the time, you won't need it.

Pros and Cons: Honest Assessment After Three Months

Let me cut through the marketing bullshit and tell you straight: is this project actually useful? Should you use it?

✅ What Works Really Well

It actually solves the amnesia problem — This is the big one. When I start a new conversation with ChatGPT now and it says "Welcome back! I see you've been working on Vibe-Memory recently..." that's magical. It actually remembers things. No more repeating yourself every single chat.
It's stupid cheap — As I said, $2.17 total for three months. You can't beat that. Even if you have 10,000 memories, it's still like 25 cents a month.
It's dead simple to self-host — docker-compose up and you're done. If you can run PostgreSQL, you can run this.
Privacy is actually real — Your data never leaves your database unless you explicitly send it. That's the whole point, and it delivers on that.
It's surprisingly fast — 42ms average query time on my free Fly instance. That's faster than most API calls to OpenAI. You don't even notice it's happening.

❌ What Still Needs Work

Setting it up still requires some technical skill — If you're not comfortable with Docker, PostgreSQL, and API keys, you're gonna have a bad time. I haven't built a nice one-click setup yet. That's on my todo list, but it's not there yet.
The extraction step isn't perfect — Sometimes it saves too much, sometimes too little. I'm still tweaking the prompt. There's probably a smarter way to do this.
No collaboration yet — Right now it's single-user only. That's fine for my use case (personal memory), but if you want to share memories with a team, this doesn't do that. Could be added, but it's not there.
Embeddings drift over time — Okay, this is a subtle one. OpenAI changes their embedding models sometimes, and if you change models, all your old embeddings are incompatible. You have to re-embed everything. Not a huge problem for personal use, but something to be aware of.
No mobile app — I use it through the Claude desktop app with MCP, which works great. But there's no mobile app yet. That's fine for me, but I know some people want that.

Who Should Actually Use This?

After three months, I think this project is perfect for:

Developers who get frustrated with AI amnesia — If you're comfortable spinning up a Docker container, this will make your AI chats way better immediately.
People who care about privacy — If you don't want all your personal conversations going to a third-party, this keeps everything in your control.
Anyone on a budget — Free basically forever for personal use. Can't beat that.

It's probably not for you if:

You don't want to host anything yourself
You need a polished UI with one-click install
You need team collaboration

Where Do We Go From Here?

So what's next for Vibe-Memory? I have a few things I want to tackle:

1. MCP Server Integration (Actually Already Done!)

If you follow the MCP (Model Context Protocol) world, I already built an MCP server for Vibe-Memory. That means you can connect it directly to Claude Desktop and it automatically adds your memories to every conversation. It's awesome. I use it every single day.

The code is already in the repo if you want to check it out.

2. Better Documentation for Non-Developers

A lot of people have starred the repo who aren't full-time developers. I want to make it easier for them to get set up. Probably a proper README with step-by-step instructions, maybe even a one-click fly.io deploy button.

3. Maybe a Simple Web UI

Right now I use it entirely through MCP, but a simple web UI for browsing/editing/deleting memories would be nice. Nothing fancy, just basic CRUD.

4. Experimenting with Local Embeddings

I've been testing nomic-embed-text-v1.5 running locally with Ollama. It works surprisingly well! The quality is almost as good as OpenAI's, and it's completely free. I want to make this a first-class option for people who want completely local memory.

5. What Do You Want?

If you're using Vibe-Memory or thinking about using it, what features are missing? What's hard about setting it up? Drop a comment below — I read every single one.

Final Thoughts: Was This Project Worth It?

Let's do the ROI calculation like I do with all my side projects:

Time invested: ~80 hours over three months
Money invested: $2.17
Utility gained: I use this every single day. It makes my AI chats actually useful. No more repeating myself. It remembers my preferences, my projects, my past decisions. That's priceless.
Learned: So much about embeddings, pgvector optimization, Go web dev, MCP, privacy architecture.

Was it worth it? Absolutely.

The funny thing is, I started this project because I was frustrated. I looked around for existing solutions and they were either too expensive, too complex, or sent all your data to someone else. Three months later, I have exactly what I wanted, and it cost me basically nothing.

That's the beauty of side projects these days. With all the open-source tools we have available, you can build something really useful in a weekend (or a few months of evenings) that solves a problem you actually have.

If you're dealing with AI amnesia too, give Vibe-Memory a try. It's on GitHub at https://github.com/kevinten10/vibe-memory — star it if you like it, open an issue if you have problems, and let me know how it goes.

Questions for You

I'm curious — have you built any AI memory projects? Do you struggle with AI amnesia like I did? What's the biggest pain point you've found with existing solutions? Drop a comment below — I'd love to hear your experiences.

And if you use Vibe-Memory, let me know what features you want to see next!