MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected
dev.to
Start with the benchmarks In a previous article, I compared three Qwen3.5 models on the same hardware. Here are the MoE-relevant numbers. Test environment: RTX 4060 8GB / Ryzen 7 / 32GB DDR5 / llama.cpp / Q4_K_M Model Speed(t/s) VRAM GPU% CPU% RAM ngl Qwen3.5-9B 33.0 7.1GB 91% 32% 22.6GB 99 (all layers GPU) Qwen3.5-27B 3.57 7.7GB 60% 74% 28.3GB 24 (24/58 layers GPU) Qwen3.5-35B-A3B 8.61 7.6GB