LoRA: I Trained <1% of a 1.5B Model and Matched a Full Fine-Tune
python
dev.to
In Part 1 I fully fine-tuned a 270M model — updating every weight. That's fine for a tiny model. It gets painful as models grow, because full fine-tuning needs gradients and optimizer state for every parameter (~4× the model size in memory). So: what do you do when the model is too big to comfortably fine-tune all of? The idea behind LoRA LoRA (Low-Rank Adaptation) rests on one observation: the change fine-tuning makes to a weight matrix is "low rank" — it lives in a small subspace.