go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices

Torque Dandachi, Sophia Diggs-Galligan|April 2, 2026arXiv

Key Takeaway

go-mHC enables efficient learned mixing of residual streams in transformers with a single tunable hyperparameter that trades off between speed and expressivity, potentially unlocking a new dimension for scaling model capacity.

Summary

This paper solves a mathematical problem in neural network design: how to efficiently mix information across different processing paths (residual streams) in transformers.

architecture efficiency scaling

Key Terms

residual-network doubly-stochastic-matrix birkhoff-polytope orthostochastic-matrix kronecker-factorized