You can speed up neural network training by 1-3% by reusing computation from low-precision matrix operations for normalization, with no accuracy loss.
This paper proposes MXNorm, a faster alternative to RMSNorm (a standard layer normalization technique) that reuses scale information already computed during low-precision matrix multiplication. By avoiding redundant calculations, MXNorm achieves 2.4x speedups in normalization while maintaining training accuracy on Llama models.