Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training — ThinkLLM