A mechanism that limits attention to a fixed-size window of recent tokens rather than all previous tokens, reducing computational cost while maintaining context awareness.