Memory Caching: RNNs with Growing Memory

Ali Behrouz, Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn et al.|February 27, 2026arXiv

Key Takeaway

Memory Caching lets RNNs scale their memory capacity with sequence length while staying faster than Transformers.

Summary

This paper fixes a major weakness of fast RNN models: they forget information too quickly because they have fixed-size memory. The authors introduce Memory Caching, which lets RNNs save snapshots of their memory as they process longer sequences. This gives RNNs the ability to remember more without becoming as slow as Transformers, creating a sweet spot between speed and accuracy.

architecture efficiency training

Key Terms

recurrent-neural-networks memory-capacity context-length hidden-states