Creating a simple local RAG system

python dev.to March 29, 2026

We'll build a simple RAG system using local only models. We will not use LangChain, which is introducing many bloated dependencies, is much slower than direct Transformers usage, is not error-free and its documentation is mostly misleading. We'll use only bare Transformers functions for that. As a vector database for storing our embeddings from document, we'll use Faiss, which is really efficient in similarity search. Note it sits in RAM, not on a disk and is very fast. What is a RAG?

Read Full Tutorial open_in_new