I built an AI-personalized news bot that runs on a single SQLite file — architecture and everything that broke

python dev.to

I built an AI-personalized news bot that runs on a single SQLite file — architecture and everything that broke

TL;DR — @futur_e_news_bot. A bilingual
(EN/RU) Telegram news feed. AI removes duplicates, clusters one event into one card,
translates, and tunes the feed to your reactions. Stack: aiogram, local ONNX
embeddings, sqlite-vec instead of pgvector, a chain of free LLMs via OpenRouter,
one worker on Fly.io. It started at ~$5/month — and broke in three interesting ways
I'll walk through.

I wanted a news feed without two things: duplicates (the same event from five
outlets with five headlines) and noise (90% of the agenda I don't care about). So
one weekend turned into a Telegram bot that de-dupes, learns what I like from 🔥/❤️/😢
reactions, and shows everything in two languages.

The interesting part isn't the idea — it's that the whole thing runs as one worker
process against a single SQLite file
, no database server, no Redis, no queue. Here's
how, and what it cost me (in dollars and in debugging).

Architecture: one worker, one SQLite, no DB server

┌──────────────────────────┐                      ┌────────────────────┐
│ Fly machine #1           │                      │ Fly machine #2     │
│  ┌─────────────────────┐ │                      │ (private-only)     │
│  │ aiogram long-polling│ │                      │   RSSHub           │
│  ├─────────────────────┤ │                      │  (Telegram → RSS)  │
│  │ APScheduler jobs    │◄┼──── Fly 6PN ─────────┤                    │
│  │  pipeline / deliver │ │  *.internal:1200     │                    │
│  │  digests / watchdog │ │                      │                    │
│  └─────────────────────┘ │                      └────────────────────┘
│  ┌─────────────────────┐ │
│  │ SQLite + sqlite-vec │ │  ← all state, on a Fly volume
│  ├─────────────────────┤ │
│  │ fastembed (ONNX)    │ │  ← embeddings computed locally, no API
│  └─────────────────────┘ │
└──────────┬───────────────┘
           │ HTTPS
           ▼
    OpenRouter (free LLM chain → cheap paid fallback)
Enter fullscreen mode Exit fullscreen mode

No public HTTP, no load balancer, no separate DB machine. The bot polls Telegram,
APScheduler runs the jobs, and all state lives in one SQLite file on a volume.
RSSHub (which bridges public Telegram channels into RSS) is a second app reachable
only over Fly's private network.

A hobby instance fits in 512 MB at roughly $5/month (I'm now on 2 GB ≈ $15/mo after
a traffic spike — more on why below). LLM cost runs $0–1/month because most calls
hit free models.

sqlite-vec instead of pgvector

It started as Postgres + pgvector. That works great, but it needs a separate DB machine
(another ~$5/mo on Fly), its own secrets, backups, migrations — a lot of infra just to
store embeddings.

sqlite-vec is a native SQLite extension that
adds vec0 virtual tables with cosine/L2/Hamming KNN right in SQL. It's basically
pgvector, embeddable and serverless.

# create the vector table
await conn.exec_driver_sql(
    "CREATE VIRTUAL TABLE IF NOT EXISTS story_vec "
    "USING vec0(embedding float[384] distance_metric=cosine)"
)

# KNN is just a SELECT
rows = await session.execute(text(
    "SELECT rowid, distance FROM story_vec "
    "WHERE embedding MATCH :v AND k = :k ORDER BY distance"
), {"v": sqlite_vec.serialize_float32(vec), "k": k})
Enter fullscreen mode Exit fullscreen mode

That powers three things: dedup at ingest (nearest neighbor under a distance
threshold → it's a duplicate), "more like this", and semantic inline search.
Result: one machine instead of two, zero cost to store vectors, near-zero infra. SQLite
is single-writer, but for one worker that's a non-issue — WAL + busy_timeout cover it.

A chain of free LLMs

Each story gets one LLM pass: classify (category, tags, importance, "breaking"),
summarize, translate EN↔RU, and score tone 0–3 (for a "good news only" toggle).
OpenRouter lets you pass an array of models and falls back automatically:

"models": [
    "qwen/qwen3-next-80b-a3b-instruct:free",
    "meta-llama/llama-3.3-70b-instruct:free",
    "mistralai/mistral-nemo",   # cheap paid fallback when the free ones 429
]
Enter fullscreen mode Exit fullscreen mode

A global rate limiter + exponential backoff on 429/5xx sit on top. Free models carry
90%+ of traffic, so the monthly bill is basically rounding error.

Local embeddings (fastembed / ONNX)

Embeddings for every story and every search query. Going to an API for that is money,
latency, and a privacy leak. fastembed runs
sentence-transformers on ONNX without PyTorch — paraphrase-multilingual-MiniLM-L12-v2,
384 dims, ~10–30 ms on CPU. Free and private.

The recommender: a taste vector + EWMA + anti-bubble

Each user has a taste_vec (same 384-dim space as stories). It starts null; the first
🔥/❤️ copies the story's embedding, and every reaction after that nudges it with an
exponential moving average. A link click is a stronger signal than a like, so it moves
the vector more. 😢 pushes it away.

Ranking = cosine(taste, story) + category affinity + tag affinity + recency, plus a
small "discovery" injection so the feed doesn't collapse into an echo chamber. The whole
thing is brute-force numpy over the last few hundred candidates — milliseconds at this
scale, and far simpler than wiring up a vector index for ranking (sqlite-vec handles the
KNN-shaped problems; ranking wants a weighted score, not nearest-neighbor).

And then it broke — three times

The honest part. None of these were the idea; all were operational.

1. sqlite-vec 0.1.6 shipped a 32-bit binary for arm64. First deploy:
OSError: wrong ELF class: ELFCLASS32. Looks like your code, is actually upstream. Pin
sqlite-vec==0.1.9.

2. fastembed changed its default pooling (CLS → mean) between minor versions. New
embeddings became silently incompatible with the stored ones; KNN started returning
garbage. The only fix is to re-embed the whole table and recompute every user's taste
vector. Pin your embedding lib and read the changelog.

3. OOM crash-loop that hid for days. Loading the ONNX model pushes RSS to ~400 MB.
On a 512 MB machine the kernel OOM-killed the process during model load, Fly
restarted it, it OOM'd again — a loop. The bot kept polling (so it looked alive) but
never created a story. Meanwhile RSS items piled up and every digest went out empty.
Bumping to 1 GB (then 2 GB under load) fixed it.

4. A feed with no timeout froze the whole pipeline for ~4 days. feedparser.parse(url)
does a blocking fetch with no timeout. One dead/slow feed hung the run forever, and
with max_instances=1 every subsequent run was skipped — a silent multi-day outage
while the bot happily answered messages. Fixes that should've been there from day one:
fetch with httpx and a hard timeout, and wrap each pipeline run in asyncio.wait_for.

The lesson from #3 and #4 is the same: a worker that polls can look healthy while
being functionally dead.
I added a watchdog that DMs me if no story has been created
in N hours. It would have caught all of it in minutes instead of days.

Try it

It's live: @futur_e_news_bot/start,
pick a language, react to a few stories, and the feed starts tuning itself. Good/neutral
news by default; you can dial in negative news in four levels.

I'd love feedback on the dedup/clustering quality and on the recommender — those are the
parts still furthest from "great." Happy to go deeper on sqlite-vec, the free-LLM chain,
or the recommender in the comments.

Source: dev.to

arrow_back Back to Tutorials