RBF Attention Reveals Dot‑Product's Hidden Norm Bias

dev.to

Swapping dot‑product attention for RBF attention sounds like an architectural revolution. In Raphael Pisoni’s experiment, it turned out to be something stranger: a one‑line algebraic tweak that silently reproduces half the “mysterious” behaviors of modern Transformers — and breaks the hardware stack in the process. TL;DR RBF attention is just dot‑product attention plus an explicit squared‑L2 penalty on keys; the “new” geometry is already latent in SDPA. Changing the metric forces you to confr

Read Full Article open_in_new
arrow_back Back to News