A hybrid model design that combines Mamba (a state-space model) with Transformer components to process long sequences more efficiently than pure Transformers while maintaining strong performance.