I built an AI-personalized news bot that runs on a single SQLite file — architecture and everything that broke

TL;DR — @futur_e_news_bot. A bilingual
(EN/RU) Telegram news feed. AI removes duplicates, clusters one event into one card,
translates, and tunes the feed to your reactions. Stack: aiogram, local ONNX
embeddings, sqlite-vec instead of pgvector, a chain of free LLMs via OpenRouter,
one worker on Fly.io. It started at ~$5/month — and broke in three interesting ways
I'll walk through.

I wanted a news feed without two things: duplicates (the same event from five
outlets with five headlines) and noise (90% of the agenda I don't care about). So
one weekend turned into a Telegram bot that de-dupes, learns what I like from 🔥/❤️/😢
reactions, and shows everything in two languages.

The interesting part isn't the idea — it's that the whole thing runs as one worker
process against a single SQLite file, no database server, no Redis, no queue. Here's
how, and what it cost me (in dollars and in debugging).

Architecture: one worker, one SQLite, no DB server

┌──────────────────────────┐                      ┌────────────────────┐
│ Fly machine #1           │                      │ Fly machine #2     │
│  ┌─────────────────────┐ │                      │ (private-only)     │
│  │ aiogram long-polling│ │                      │   RSSHub           │
│  ├─────────────────────┤ │                      │  (Telegram → RSS)  │
│  │ APScheduler jobs    │◄┼──── Fly 6PN ─────────┤                    │
│  │  pipeline / deliver │ │  *.internal:1200     │                    │
│  │  digests / watchdog │ │                      │                    │
│  └─────────────────────┘ │                      └────────────────────┘
│  ┌─────────────────────┐ │
│  │ SQLite + sqlite-vec │ │  ← all state, on a Fly volume
│  ├─────────────────────┤ │
│  │ fastembed (ONNX)    │ │  ← embeddings computed locally, no API
│  └─────────────────────┘ │
└──────────┬───────────────┘
           │ HTTPS
           ▼
    OpenRouter (free LLM chain → cheap paid fallback)

No public HTTP, no load balancer, no separate DB machine. The bot polls Telegram,
APScheduler runs the jobs, and all state lives in one SQLite file on a volume.
RSSHub (which bridges public Telegram channels into RSS) is a second app reachable
only over Fly's private network.

A hobby instance fits in 512 MB at roughly $5/month (I'm now on 2 GB ≈ $15/mo after
a traffic spike — more on why below). LLM cost runs $0–1/month because most calls
hit free models.

sqlite-vec instead of pgvector

It started as Postgres + pgvector. That works great, but it needs a separate DB machine
(another ~$5/mo on Fly), its own secrets, backups, migrations — a lot of infra just to
store embeddings.

sqlite-vec is a native SQLite extension that
adds vec0 virtual tables with cosine/L2/Hamming KNN right in SQL. It's basically
pgvector, embeddable and serverless.

# create the vector table
await conn.exec_driver_sql(
    "CREATE VIRTUAL TABLE IF NOT EXISTS story_vec "
    "USING vec0(embedding float[384] distance_metric=cosine)"
)

# KNN is just a SELECT
rows = await session.execute(text(
    "SELECT rowid, distance FROM story_vec "
    "WHERE embedding MATCH :v AND k = :k ORDER BY distance"
), {"v": sqlite_vec.serialize_float32(vec), "k": k})

That powers three things: dedup at ingest (nearest neighbor under a distance
threshold → it's a duplicate), "more like this", and semantic inline search.
Result: one machine instead of two, zero cost to store vectors, near-zero infra. SQLite
is single-writer, but for one worker that's a non-issue — WAL + busy_timeout cover it.

A chain of free LLMs

Each story gets one LLM pass: classify (category, tags, importance, "breaking"),
summarize, translate EN↔RU, and score tone 0–3 (for a "good news only" toggle).
OpenRouter lets you pass an array of models and falls back automatically:

"models": [
    "qwen/qwen3-next-80b-a3b-instruct:free",
    "meta-llama/llama-3.3-70b-instruct:free",
    "mistralai/mistral-nemo",   # cheap paid fallback when the free ones 429
]

A global rate limiter + exponential backoff on 429/5xx sit on top. Free models carry
90%+ of traffic, so the monthly bill is basically rounding error.

Local embeddings (fastembed / ONNX)

Embeddings for every story and every search query. Going to an API for that is money,
latency, and a privacy leak. fastembed runs
sentence-transformers on ONNX without PyTorch — paraphrase-multilingual-MiniLM-L12-v2,
384 dims, ~10–30 ms on CPU. Free and private.

The recommender: a taste vector + EWMA + anti-bubble

Each user has a taste_vec (same 384-dim space as stories). It starts null; the first
🔥/❤️ copies the story's embedding, and every reaction after that nudges it with an
exponential moving average. A link click is a stronger signal than a like, so it moves
the vector more. 😢 pushes it away.

Ranking = cosine(taste, story) + category affinity + tag affinity + recency, plus a
small "discovery" injection so the feed doesn't collapse into an echo chamber. The whole
thing is brute-force numpy over the last few hundred candidates — milliseconds at this
scale, and far simpler than wiring up a vector index for ranking (sqlite-vec handles the
KNN-shaped problems; ranking wants a weighted score, not nearest-neighbor).

And then it broke — three times

The honest part. None of these were the idea; all were operational.

1. sqlite-vec 0.1.6 shipped a 32-bit binary for arm64. First deploy:
OSError: wrong ELF class: ELFCLASS32. Looks like your code, is actually upstream. Pin
sqlite-vec==0.1.9.

2. fastembed changed its default pooling (CLS → mean) between minor versions. New
embeddings became silently incompatible with the stored ones; KNN started returning
garbage. The only fix is to re-embed the whole table and recompute every user's taste
vector. Pin your embedding lib and read the changelog.

3. OOM crash-loop that hid for days. Loading the ONNX model pushes RSS to ~400 MB.
On a 512 MB machine the kernel OOM-killed the process during model load, Fly
restarted it, it OOM'd again — a loop. The bot kept polling (so it looked alive) but
never created a story. Meanwhile RSS items piled up and every digest went out empty.
Bumping to 1 GB (then 2 GB under load) fixed it.

4. A feed with no timeout froze the whole pipeline for ~4 days. feedparser.parse(url)
does a blocking fetch with no timeout. One dead/slow feed hung the run forever, and
with max_instances=1 every subsequent run was skipped — a silent multi-day outage
while the bot happily answered messages. Fixes that should've been there from day one:
fetch with httpx and a hard timeout, and wrap each pipeline run in asyncio.wait_for.

The lesson from #3 and #4 is the same: a worker that polls can look healthy while
being functionally dead. I added a watchdog that DMs me if no story has been created
in N hours. It would have caught all of it in minutes instead of days.

Try it

It's live: @futur_e_news_bot — /start,
pick a language, react to a few stories, and the feed starts tuning itself. Good/neutral
news by default; you can dial in negative news in four levels.

I'd love feedback on the dedup/clustering quality and on the recommender — those are the
parts still furthest from "great." Happy to go deeper on sqlite-vec, the free-LLM chain,
or the recommender in the comments.