A hybrid neural network design that combines recurrent processing (which maintains memory across sequences) with attention mechanisms, enabling better memory efficiency than standard transformers.
Performance retention over long documents and conversations