Why E8 lattice quantization beats scalar quantization for KV caches
python
dev.to
Most KV cache quantization methods treat each number independently: round each float to the nearest 2-bit or 4-bit value. This works, but it wastes bits. The E8 lattice quantizes 8 numbers at once, exploiting correlations between dimensions. The result: 3x better compression under entropy coding compared to scalar quantization at the same distortion. The problem with scalar quantization Given a 128-dimensional KV vector, scalar INT2 quantization rounds each of the 128 values indepen