Vibe-Memory Part 2: Comparing Embedding Models — I Tested 7 Models So You Don't Have To

go dev.to

Vibe-Memory Part 2: Comparing Embedding Models — I Tested 7 Models So You Don't Have To

Honestly, after building Vibe-Memory and getting the basic version working, I thought I was done. "Cool, embeddings are embeddings, right? Just pick any popular one and call it a day."

Three weeks later, I'd burned through $12 in API credits, re-embedded my 10,000 memories six times, and learned the hard way that not all embeddings are created equal. Especially when you're building a personal semantic memory that needs to be cheap, fast, and actually understand the vibe of what you're writing.

If you don't know what Vibe-Memory is, it's the open source project I built to fix ChatGPT's amnesia. It gives your AI conversations a semantic memory layer that lives on your own infrastructure. You can check out the first article here and the code here.

Today I want to share what I learned testing different embedding models for personal semantic memory. I tested seven popular options, broke a few things, and found some surprising results that you can directly reuse in your own project.

Let's Talk About Requirements First

Before we jump into comparisons, let's get clear on what we actually need for a personal semantic memory:

  1. Cost matters: This is a personal project, not a startup. $0.0001 per embedding is fine. $0.001 per embedding gets expensive fast when you have 10k+ memories.
  2. Quality needs to be "good enough": You don't need state-of-the-art for personal use. You just need similar vibes to show up together.
  3. Speed: It shouldn't take 5 seconds to search your memory.
  4. Privacy options: Some people want to run everything locally, some are fine with APIs. Both should work.
  5. Small dimension preference: Lower dimensions = faster search = smaller database. Doesn't hurt if quality holds.

I tested these seven candidates:

Model Provider Dimensions Cost per 1k tokens
text-embedding-3-small OpenAI 1536 $0.02
text-embedding-3-large OpenAI 3072 $0.13
text-embedding-ada-002 OpenAI 1536 $0.10
Cohere Embed v3 Cohere 1024 $0.10
BGE-small-v1.5 Local (HuggingFace) 384 Free
BGE-base-v1.5 Local (HuggingFace) 768 Free
all-MiniLM-L6-v2 Local (HuggingFace) 384 Free

I used my own conversation history as test data — 2,342 memory chunks from actual chats with AI about coding, project ideas, recipes, and random life thoughts. Then I did a bunch of real-world searches and measured how relevant the results felt.

The Surprising Results

Let me cut to the chase — here's what I found:

1. OpenAI text-embedding-3-small: My Current Pick

Cost: $0.02 per 1k tokens → That's 0.002 cents per average memory. For 10k memories, that's $0.20 total. Are you kidding me? That's nothing.

Quality: Honestly? It's better than I expected. I was worried smaller dimensions would mean worse quality, but for personal use it's absolutely solid. It gets the vibe right more often than not. Related memories cluster together nicely, even when they don't share exact keywords.

Speed: API calls are fast — usually 200-500ms per batch. Not instant, but totally acceptable for personal use.

Pros:

  • Insanely cheap
  • Good enough quality for personal use
  • No local GPU required
  • OpenAI handles all the infrastructure
  • Smaller dimensions than ada-002 for same or better quality

Cons:

  • Still depends on OpenAI API
  • Your data goes to OpenAI (though you're just sending chunks of your memories, not full prompts usually)
  • Rate limits if you're embedding a lot at once

My take: If you're okay with using an API and don't mind sending your data to OpenAI, this is the sweet spot for personal projects. I'm currently running with this in production and it's working great.

// Example: Using OpenAI text-embedding-3-small in Vibe-Memory
type OpenAIEmbedder struct {
    client *openai.Client
    model  string
}

func (o *OpenAIEmbedder) Embed(text string) ([]float32, error) {
    resp, err := o.client.CreateEmbeddings(context.Background(), openai.EmbeddingRequest{
        Input: []string{text},
        Model: openai.SmallEmbedding3,
    })
    if err != nil {
        return nil, err
    }
    return resp.Data[0].Embedding, nil
}
Enter fullscreen mode Exit fullscreen mode

2. OpenAI text-embedding-3-large: Overkill for Personal Use

Cost: $0.13 per 1k tokens → 10k memories would be around $1.30. Still not breaking the bank, but 6.5x more expensive than small.

Quality: Yes, it's better. But is it noticeably better for personal use? In my testing, not really. Maybe 5-10% more accurate on tricky semantic connections. But for my personal memory, I couldn't justify the extra cost when small already works fine.

Takeaway: If you're building something for production with thousands of users, sure go large. For your own personal memory? Save the money. The gains aren't worth it for most people.

3. Cohere Embed v3: Solid, But More Expensive Than OpenAI Small

I really wanted to like Cohere. Their API is great, and v3 is supposed to be top-tier.

Quality: It's good — comparable to OpenAI 3-large in many cases. The multi-lingual support is better if you need that.

Cost: $0.10 per 1k tokens — still 5x more expensive than OpenAI 3-small. For personal use, that doesn't make sense when OpenAI's smaller model is already good enough.

If you need multi-lingual: Definitely give it a shot. Cohere really shines here. But for English-only personal use, I didn't see enough value for the extra cost.

4. Local Models: BGE vs MiniLM

Okay, let's talk about running everything locally. This is what I really wanted to work — no API calls, no privacy issues, just everything on my machine.

I tested three popular local options:

all-MiniLM-L6-v2

  • Size: ~90MB
  • Dimensions: 384
  • Speed: Blazing fast on my laptop — ~50ms per embedding
  • Quality: It's okay. Definitely worse than the good API models. It mixes up related topics more often than I'd like. Good enough for prototyping, but I wouldn't want to use it daily.

BGE-small-v1.5

  • Size: ~300MB
  • Dimensions: 384
  • Speed: Still pretty fast — ~100ms per embedding on CPU
  • Quality: Wow, this is actually really good. BGE models punch way above their weight. For a local model, it's shockingly good. I'd put it at maybe 80-85% of the quality of OpenAI 3-small. That's impressive for a free local model.

BGE-base-v1.5

  • Size: ~400MB
  • Dimensions: 768
  • Speed: ~200ms per embedding on CPU
  • Quality: Even better than BGE-small. Closer to 90% of OpenAI 3-small quality. Still not quite there, but really close.

The Local vs API Verdict

If you need local for privacy reasons — go with BGE-base-v1.5. It's the best sweet spot. Quality is good enough for daily use, it's not that big, and it's fast enough on a modern laptop.

But if you can tolerate using an API — OpenAI text-embedding-3-small is still better quality for the price, and it's way easier to set up. No dealing with HuggingFace models, no GPU required on your end, just works.

What About the Old Guard: text-embedding-ada-002

Ah, ada-002. The old standby. Should you still use it?

Short answer: No. Not unless you have legacy code that already uses it.

text-embedding-3-small is cheaper ($0.02 vs $0.10 per 1k) and better quality in smaller dimensions. There's just no reason to use ada-002 for new projects anymore. OpenAI won this round — they made it cheaper and better. Good on them.

The Real World Example: What Changed After Switching

Let me show you a concrete example of why embedding model choice matters.

I had a bunch of memories about "motorcycle route planning", "ADV riding off-road", "GPS tracks cleaning", "best mountain roads in Taiwan".

With all-MiniLM-L6-v2, when I searched for "off-road riding tips", it would give me:

  1. A recipe for oatmeal I saved (contained the word "road" in "oatmeal road" something something... yeah.)
  2. A discussion about tire pressure for my car
  3. Maybe the right memory at position 3 or 4

With BGE-small-v1.5, it got the right memory at position 1 most of the time, but sometimes still messed up semantic connections.

With OpenAI text-embedding-3-small, it gets the right memory at position 1 almost every time. Even when there are no shared keywords — just similar vibes and topics.

That's the difference. It's not just about numbers on a leaderboard — it's about whether your actual daily searches work how you expect them to.

Pros & Cons: What I Wish Someone Told Me

What Works Great

OpenAI 3-small is insane value for money — $0.20 for 10k memories. That's nothing. You can't even buy a coffee for $0.20.
Local BGE models are way better than they used to be — if you need local, you can absolutely make it work for personal use.
Smaller dimensions are underrated — 1536 is plenty. Lower dimensions mean faster search, smaller database, everything is snappier.
You don't need the best model — "good enough" is actually good enough when it's your personal memory.

What Still Sucks

Local models need disk space — BGE-base is 400MB. Not huge, but bigger than API which needs nothing from you.
CPU inference is okay but not instant — you'll wait a second or two for bulk embeddings. Fine for adding new memories, not the end of the world.
API models mean dependency — if OpenAI goes down, your memory search goes down. For a personal project, that's acceptable to me, but know the tradeoff.
No model is perfect — even the best model still occasionally pulls something irrelevant that makes you go "huh?" But that's okay, it's still better than no memory at all.

My Current Setup

Here's what I'm actually using day-to-day:

// config.go
const (
    EmbeddingModel = "openai-text-3-small"
    DefaultDimensions = 1536
    BatchSize = 100
)
Enter fullscreen mode Exit fullscreen mode

Why? Because:

  • It's cheap enough ($0.02 / 1k tokens)
  • Quality is good enough for my needs
  • I don't mind sending my memory chunks to OpenAI (they're just my personal notes anyway)
  • Zero maintenance — OpenAI handles all the infrastructure
  • It just works. I've had zero issues so far.

But if I ever need to go fully local, I'd switch to BGE-base-v1.5 without much hesitation. The quality drop is noticeable but not deal-breaking for personal use.

Cost Breakdown: How Much Does This Actually Cost?

Let's do the math for real numbers:

  • 1,000 memories (average 100 tokens each): 100k tokens → $0.002 with OpenAI 3-small. That's two tenths of a cent.
  • 10,000 memories: 1M tokens → $0.02 → two cents.
  • 100,000 memories: 10M tokens → $0.20 → twenty cents.

Are you kidding me? That's nothing. Even if you have a ton of memories, this is basically free. I spent more on coffee today than I'll spend on embeddings this year.

Compare that to local: it's free, but you need to have the disk space and do the setup. For most people, paying two cents for zero hassle is a no-brainer.

Lessons I Learned the Hard Way

  1. Start with the cheapest good enough option — don't overspend on state-of-the-art for a personal project. You probably don't need it.
  2. Test with your actual data — leaderboard numbers don't tell the whole story. Your use case is unique.
  3. Local is totally viable now — BGE changed the game. If privacy is a big concern, you absolutely can run everything yourself now.
  4. Dimensions aren't everything — 384 can work great if the model is good. Don't assume bigger is always better.
  5. Re-embedding is expensive (in time) — pick something reasonably good early, you don't want to re-embed 10k memories six times like I did. Trust me on this one.

Wrapping Up

If you're building your own personal semantic memory, here's my recommendation based on three weeks of testing:

  • Most people: Use OpenAI text-embedding-3-small. It's cheap, it's good enough, it just works.
  • Privacy-focused / need local: Use BGE-base-v1.5. It's free, quality is surprisingly good, runs on your laptop.
  • Multi-lingual: Use Cohere Embed v3. They're really good at this.
  • Legacy projects: Still on ada-002? Migrate to 3-small. It's cheaper and better.

The big takeaway? Building personal semantic memory is actually affordable now. Even with the best API model, it's basically free for personal use. There's no reason not to build it anymore.

I've already been using Vibe-Memory daily for a couple months, and it's genuinely changed how I interact with AI. No more "you don't remember what we talked about yesterday?" It just works.


Your Turn

Have you built anything with semantic memory? What embedding model are you using? Did you have different experiences than me? I'm always curious to hear what works for other people — drop a comment below and let me know!


Vibe-Memory is open source and available on GitHub if you want to try it yourself. Contributions welcome!

Source: dev.to

arrow_back Back to Tutorials