Vibe-Memory Part 3: How I Optimized pgvector for AI Semantic Memory (10x Faster Queries With 5 Simple Tricks)
Honestly, after two articles building Vibe-Memory — one introducing the project and one comparing embedding models — I thought I was done with the hard parts. I had working code, decent performance, and it solved my actual problem: fixing ChatGPT's amnesia.
Then I added 5,000 memories.
And suddenly, queries that used to take 100ms were taking 1.5 seconds.
Oops.
So here's the thing: I'm not a database optimization expert. I'm just a guy who wanted AI to remember me. But after three weeks of digging, breaking things, and eventually getting 10x faster queries, I learned a lot that I wish someone had told me upfront.
If you're building anything with pgvector and AI semantic memory, this one's for you.
The Problem: Why "It Just Works" Stops Working At 5,000 Vectors
Let me catch you up quick: Vibe-Memory is a semantic memory layer for AI. Every time you have a conversation, it stores the memory as a vector embedding in PostgreSQL with pgvector. When you ask a question, it searches for the most semantically similar memories and sends them to the AI.
With 100 memories, everything was fine. Even 1,000 was okay. But at 5,000? Exact nearest neighbor search (the default) has to check every single vector every time. That's O(n) search — and it shows.
Here's what I started with:
// Original naive search — works great until it doesn't
func (s *Storage) SearchSimilarVectors(queryVector []float32, limit int) ([]Memory, error) {
var memories []Memory
err := s.db.Select(&memories, `
SELECT id, content, embedding <-> $1 AS distance
FROM memories
ORDER BY distance
LIMIT $2
`, queryVector, limit)
return memories, err
}
This is the code you'll see in most tutorials. And it works! For a while. But as you add more data, it gets slower and slower.
I'm using this for personal use, so I don't have millions of vectors. But even 5,000 was getting painful. And honestly, I got lazy — I didn't want to set up a separate vector database just for my side project. PostgreSQL is already there, I pay for the VPS, why add more moving parts?
Turns out, with a few simple tricks, you can get huge speedups just by optimizing what you already have. No need for PostgreSQL extensions you have to compile from source (though I did try one — more on that later).
Let me show you what worked.
Trick 1: Use Indexes — Duh, But Which One?
Okay, okay, I know what you're thinking: "Just add an index, dummy." I've heard you. But here's the thing about pgvector indexes — it's not as simple as "add an index and you're done."
pgvector supports three main index types:
- IVFFlat - Inverted file index. Fast queries, slower builds. Good for most cases.
- HNSW - Hierarchical navigable small world. Faster queries, much slower builds, higher memory usage. Better for larger datasets.
- No index - Exact search. Slow for large N, but 100% accurate.
I went with IVFFlat because I wanted something that didn't take forever to build and didn't eat all my RAM. Here's how I created it:
-- First, check how many lists you need. Rule of thumb: rows / 1000
-- I have ~5000 rows, so 5 lists is enough? Wait no — actually the recommendation is sqrt(rows) * 2
-- 5000 rows: sqrt(5000) ≈ 70, so 100-200 lists. Okay, let's do 100.
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Wait, I got the number of lists wrong at first. I started with 5 lists because 5000/1000 = 5. That gave me terrible recall — many relevant vectors weren't even found. The actual heuristic is number_of_lists = rows / 1000 OR number_of_lists = 4 * sqrt(rows), whichever is bigger. For 5000 rows, that's ~90, so 100 is good.
Before index (5000 vectors): ~1200ms per query
After IVFFlat index: ~150ms per query
That's an 8x improvement right there! Just adding an index. I can't believe I put that off for so long.
But wait — there's a catch. IVFFlat is an approximate index. It doesn't check every vector. It's possible to miss some relevant results. For my use case — personal semantic memory — approximate is totally fine. I just need the AI to get the gist, not every single memory ever. If you're doing this for production where 100% recall matters, you might think differently. But for me, 95% recall at 8x speed is a no-brainer.
Trick 2: Partition Your Data By Time (Because Old Memories Matter Less)
Here's an insight I didn't expect: not all memories are created equal.
When you're having a conversation with AI, do you care equally about something you wrote yesterday vs something you wrote 2 years ago?
Me neither. Recent memories are almost always more relevant. So why search all 5000 memories every single time?
I added a simple trick: I only search the most recent N months of memories by default. If that doesn't give good results, I can widen the search. But 90% of the time, recent is enough.
Here's how that looks in code:
// Add time-based partitioning — search recent first
func (s *Storage) SearchSimilarVectors(queryVector []float32, limit int, months int) ([]Memory, error) {
var memories []Memory
cutoff := time.Now().AddDate(0, -months, 0)
err := s.db.Select(&memories, `
SELECT id, content, created_at, embedding <-> $1 AS distance
FROM memories
WHERE created_at >= $3
ORDER BY distance
LIMIT $2
`, queryVector, limit, cutoff)
return memories, err
}
Default I search last 6 months. That gets me down from ~5000 vectors to ~1200 vectors for a typical search. If I really need something older, I can bump it to 24 months.
With index + time filtering: ~80ms per query
Another 2x improvement! And this makes sense semantically too — it's not just a performance trick, it actually improves results because newer memories are more relevant.
Trick 3: Store Only What You Need, Select Only What You Want
This one seems obvious, but I missed it at first.
Originally, I was selecting the embedding column in every search. Why? Because I thought I needed it for something. But do I?
Wait — when you search for similar vectors, pgvector does the distance calculation using the index. You don't need to get the embedding back from the database for the search to work. You just need the content and id to display to the user.
Original query:
SELECT id, content, embedding, embedding <-> $1 AS distance
FROM memories
ORDER BY distance
LIMIT $2
See that? I'm pulling the entire embedding vector (512 floats per row) across the wire for every result. That's unnecessary data transfer. For 10 results, that's 10 * 512 * 4 bytes = ~20KB — not the end of the world, but why?
Fixed query:
SELECT id, content, created_at, embedding <-> $1 AS distance
FROM memories
ORDER BY distance
LIMIT $2
Gone is the embedding column from the SELECT. Small change, but it reduces the data PostgreSQL has to send back over the wire. For me, this shaved another ~5-10ms off the query time. Not huge, but every bit helps.
Another thing: I only store what I actually need. Do I need to keep the original embedding file on disk after inserting into PostgreSQL? No. Do I need to store extra metadata I'll never use in the search? No. Keep your rows lean, keep your database happy.
Trick 4: Vacuum Analyze After Adding a Lot of Data
Here's a mistake I made: I bulk inserted 5000 memories all at once after creating the index. And my queries were still slow. Why?
Because when you create an IVFFlat index before adding data, or add a ton of data after creating the index, PostgreSQL needs to update its statistics. The index planner needs up-to-date stats to choose the right plan.
After bulk loading, just run:
VACUUM ANALYZE memories;
That's it. It takes a few seconds, and suddenly the query planner knows how to use your index properly. I saw another ~10-15ms improvement after this. The index planner was choosing a bad plan before, and this fixed it.
Pro tip: If you're adding data incrementally, PostgreSQL will auto-vacuum eventually. But if you just bulk loaded thousands of rows, run it manually. It doesn't hurt, and it helps a lot.
Trick 5: Use Cosine Distance Instead of L2 Distance (If That Makes Sense)
Wait, does distance type affect performance? Actually, sometimes it does — but more importantly, it affects accuracy.
I'm using 512-dimensional embeddings from text-embedding-3-small. OpenAI trains these embeddings so that cosine similarity correlates with semantic similarity. So using cosine distance (<-> operator in pgvector) is what you want.
But here's the thing: L2 distance works fine too, and sometimes it's slightly faster. But for semantic similarity, cosine distance gives better results. And the performance difference is negligible if you have an index.
Actually, the bigger performance thing is dimensions. I already wrote about this in the last article — using 512 dimensions instead of 1536 gives you almost the same quality with 1/3 the storage and 3x faster searches. That's a bigger win than distance choice.
But to recap: smaller dimensions = faster everything. If you can go from 1536 to 512, do it. Your future self will thank you.
What About HNSW vs IVFFlat? I Tried Both.
I was curious — would HNSW be faster? HNSW is supposed to have faster queries than IVFFlat, right?
So I tried it:
DROP INDEX idx_memories_embedding;
CREATE INDEX ON memories USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
Build time on my cheap VPS: ~45 seconds vs ~5 seconds for IVFFlat. That's not terrible, but definitely slower.
Query performance: ~60ms vs ~80ms for IVFFlat with time filtering. So about 25% faster. That's nice! But the tradeoff is higher memory usage and slower builds.
For my dataset (5000 vectors × 512 dimensions), memory usage went from ~20MB to ~35MB. That's still nothing on a modern VPS. So why didn't I stick with it?
Well, honestly, 60ms vs 80ms — both are fast enough for my use case. I can't tell the difference when using the app interactively. IVFFlat was already "fast enough," and building is quicker when I need to rebuild the index.
But if you have more data (100k+ vectors) or you care about every millisecond, HNSW is definitely worth trying. The performance difference is real.
My takeaway:
- < 10k vectors: IVFFlat is fine, easier to build, good enough
- 10k-100k vectors: HNSW is probably worth it
- >100k vectors: You probably already knew that, go for HNSW
The Final Results: From 1200ms to 100ms (and Sometimes Lower)
Let's recap what happened:
| Stage | Query Time | Improvement |
|---|---|---|
| No index, full search | ~1200ms | baseline |
| + IVFFlat index | ~150ms | 8x faster |
| + Time-based filtering (last 6 months) | ~80ms | 1.9x faster |
| + Remove unused columns from SELECT | ~70ms | 1.1x faster |
| + VACUUM ANALYZE | ~65ms | 1.1x faster |
| + HNSW index (optional) | ~50ms | 1.3x faster |
Starting from 1200ms, ending at ~65ms. That's ~18x faster with just a few simple changes. Are you kidding me? That's insane. I didn't expect that much improvement.
And with time filtering on 6 months, I'm often searching only a few hundred vectors, so queries can be as fast as 20-30ms. That's instant for all practical purposes.
The best part? I didn't have to add any new infrastructure. I'm still using plain PostgreSQL with pgvector, same as before. No new services to maintain, no extra costs, just better performance.
Pros and Cons: Is This Approach Right For You?
Let me be honest — this works great for my use case. But it's not for everyone. Here's the straight talk:
Pros:
- ✅ Keeps everything in PostgreSQL — no extra services to run
- ✅ Huge performance improvements with minimal changes
- ✅ Works great for personal projects and small-to-medium datasets (up to 100k vectors easily)
- ✅ Time-based filtering improves both speed and result relevance
- ✅ All these tricks are standard PostgreSQL/pgvector — no weird hacks
- ✅ You can incrementally adopt — start simple, optimize as you grow
Cons:
- ❌ Approximate indexes mean you give up some recall (fine for personal AI memory, maybe not for production search)
- ❌ IVFFlat needs to be rebuilt if your data distribution changes a lot (easy to do, but manual step)
- ❌ HNSW builds slower and uses more memory
- ❌ This isn't going to beat a dedicated vector database at 1M+ vectors — but if you have 1M+ vectors, you probably already knew that
Who this is for:
- People building personal AI tools
- Side projects where you already use PostgreSQL
- Small to medium datasets (< 100k vectors)
- Teams that want to avoid adding more infrastructure
- Anyone who thinks "if it ain't broke, don't add more moving parts"
Who this isn't for:
- Production search at scale with millions of vectors
- Applications that require 100% recall
- Teams that already have a dedicated vector database like Pinecone or Weaviate
What I Learned The Hard Way
Honestly, I went into this thinking I'd need to move to a dedicated vector database to get good performance. And that's what a lot of articles online will tell you — "PostgreSQL isn't for vectors, use a dedicated vector DB."
But you know what? For 99% of side projects and small-to-medium applications, PostgreSQL + pgvector with a few optimizations is more than good enough. I'm getting sub-100ms queries on a cheap $5/month VPS with 5000 vectors. That's way better than I expected.
The biggest lesson: optimize what you have before you add more infrastructure. I added an index, filtered by time, cleaned up my query, and got the performance I needed without adding anything new. That's a win in my book.
I still think dedicated vector databases are great for large-scale production. But don't feel like you need one just because you're working with vectors. Start simple, optimize as you grow.
Your Turn
I'm still learning this stuff as I go. I've been using Vibe-Memory daily for a couple months now, and it's genuinely solved the AI amnesia problem for me. It's nice having AI that actually remembers who I am, what I've done, and what I care about — without sending all my private memories to some third-party service.
What about you — have you been playing with pgvector for AI memory? What optimizations have you found that work? Did you hit the same performance wall I did when you passed a few thousand vectors? Drop a comment below and share your tricks — I'd love to learn what works for you.
Vibe-Memory is open source on GitHub — check it out here if you want to see the full code including all these optimizations. Star it if you find it useful!