I built an AI-personalized news bot that runs on a single SQLite file — architecture and everything that broke
TL;DR — @futur_e_news_bot. A bilingual
(EN/RU) Telegram news feed. AI removes duplicates, clusters one event into one card,
translates, and tunes the feed to your reactions. Stack: aiogram, local ONNX
embeddings, sqlite-vec instead of pgvector, a chain of free LLMs via OpenRouter,
one worker on Fly.io. It started at ~$5/month — and broke in three interesting ways
I'll walk through.
I wanted a news feed without two things: duplicates (the same event from five
outlets with five headlines) and noise (90% of the agenda I don't care about). So
one weekend turned into a Telegram bot that de-dupes, learns what I like from 🔥/❤️/😢
reactions, and shows everything in two languages.
The interesting part isn't the idea — it's that the whole thing runs as one worker
process against a single SQLite file, no database server, no Redis, no queue. Here's
how, and what it cost me (in dollars and in debugging).
Architecture: one worker, one SQLite, no DB server
┌──────────────────────────┐ ┌────────────────────┐
│ Fly machine #1 │ │ Fly machine #2 │
│ ┌─────────────────────┐ │ │ (private-only) │
│ │ aiogram long-polling│ │ │ RSSHub │
│ ├─────────────────────┤ │ │ (Telegram → RSS) │
│ │ APScheduler jobs │◄┼──── Fly 6PN ─────────┤ │
│ │ pipeline / deliver │ │ *.internal:1200 │ │
│ │ digests / watchdog │ │ │ │
│ └─────────────────────┘ │ └────────────────────┘
│ ┌─────────────────────┐ │
│ │ SQLite + sqlite-vec │ │ ← all state, on a Fly volume
│ ├─────────────────────┤ │
│ │ fastembed (ONNX) │ │ ← embeddings computed locally, no API
│ └─────────────────────┘ │
└──────────┬───────────────┘
│ HTTPS
▼
OpenRouter (free LLM chain → cheap paid fallback)
No public HTTP, no load balancer, no separate DB machine. The bot polls Telegram,
APScheduler runs the jobs, and all state lives in one SQLite file on a volume.
RSSHub (which bridges public Telegram channels into RSS) is a second app reachable
only over Fly's private network.
A hobby instance fits in 512 MB at roughly $5/month (I'm now on 2 GB ≈ $15/mo after
a traffic spike — more on why below). LLM cost runs $0–1/month because most calls
hit free models.
sqlite-vec instead of pgvector
It started as Postgres + pgvector. That works great, but it needs a separate DB machine
(another ~$5/mo on Fly), its own secrets, backups, migrations — a lot of infra just to
store embeddings.
sqlite-vec is a native SQLite extension that
adds vec0 virtual tables with cosine/L2/Hamming KNN right in SQL. It's basically
pgvector, embeddable and serverless.
# create the vector table
await conn.exec_driver_sql(
"CREATE VIRTUAL TABLE IF NOT EXISTS story_vec "
"USING vec0(embedding float[384] distance_metric=cosine)"
)
# KNN is just a SELECT
rows = await session.execute(text(
"SELECT rowid, distance FROM story_vec "
"WHERE embedding MATCH :v AND k = :k ORDER BY distance"
), {"v": sqlite_vec.serialize_float32(vec), "k": k})
That powers three things: dedup at ingest (nearest neighbor under a distance
threshold → it's a duplicate), "more like this", and semantic inline search.
Result: one machine instead of two, zero cost to store vectors, near-zero infra. SQLite
is single-writer, but for one worker that's a non-issue — WAL + busy_timeout cover it.
A chain of free LLMs
Each story gets one LLM pass: classify (category, tags, importance, "breaking"),
summarize, translate EN↔RU, and score tone 0–3 (for a "good news only" toggle).
OpenRouter lets you pass an array of models and falls back automatically:
"models": [
"qwen/qwen3-next-80b-a3b-instruct:free",
"meta-llama/llama-3.3-70b-instruct:free",
"mistralai/mistral-nemo", # cheap paid fallback when the free ones 429
]
A global rate limiter + exponential backoff on 429/5xx sit on top. Free models carry
90%+ of traffic, so the monthly bill is basically rounding error.
Local embeddings (fastembed / ONNX)
Embeddings for every story and every search query. Going to an API for that is money,
latency, and a privacy leak. fastembed runs
sentence-transformers on ONNX without PyTorch — paraphrase-multilingual-MiniLM-L12-v2,
384 dims, ~10–30 ms on CPU. Free and private.
The recommender: a taste vector + EWMA + anti-bubble
Each user has a taste_vec (same 384-dim space as stories). It starts null; the first
🔥/❤️ copies the story's embedding, and every reaction after that nudges it with an
exponential moving average. A link click is a stronger signal than a like, so it moves
the vector more. 😢 pushes it away.
Ranking = cosine(taste, story) + category affinity + tag affinity + recency, plus a
small "discovery" injection so the feed doesn't collapse into an echo chamber. The whole
thing is brute-force numpy over the last few hundred candidates — milliseconds at this
scale, and far simpler than wiring up a vector index for ranking (sqlite-vec handles the
KNN-shaped problems; ranking wants a weighted score, not nearest-neighbor).
And then it broke — three times
The honest part. None of these were the idea; all were operational.
1. sqlite-vec 0.1.6 shipped a 32-bit binary for arm64. First deploy:
OSError: wrong ELF class: ELFCLASS32. Looks like your code, is actually upstream. Pin
sqlite-vec==0.1.9.
2. fastembed changed its default pooling (CLS → mean) between minor versions. New
embeddings became silently incompatible with the stored ones; KNN started returning
garbage. The only fix is to re-embed the whole table and recompute every user's taste
vector. Pin your embedding lib and read the changelog.
3. OOM crash-loop that hid for days. Loading the ONNX model pushes RSS to ~400 MB.
On a 512 MB machine the kernel OOM-killed the process during model load, Fly
restarted it, it OOM'd again — a loop. The bot kept polling (so it looked alive) but
never created a story. Meanwhile RSS items piled up and every digest went out empty.
Bumping to 1 GB (then 2 GB under load) fixed it.
4. A feed with no timeout froze the whole pipeline for ~4 days. feedparser.parse(url)
does a blocking fetch with no timeout. One dead/slow feed hung the run forever, and
with max_instances=1 every subsequent run was skipped — a silent multi-day outage
while the bot happily answered messages. Fixes that should've been there from day one:
fetch with httpx and a hard timeout, and wrap each pipeline run in asyncio.wait_for.
The lesson from #3 and #4 is the same: a worker that polls can look healthy while
being functionally dead. I added a watchdog that DMs me if no story has been created
in N hours. It would have caught all of it in minutes instead of days.
Try it
It's live: @futur_e_news_bot — /start,
pick a language, react to a few stories, and the feed starts tuning itself. Good/neutral
news by default; you can dial in negative news in four levels.
I'd love feedback on the dedup/clustering quality and on the recommender — those are the
parts still furthest from "great." Happy to go deeper on sqlite-vec, the free-LLM chain,
or the recommender in the comments.