A technique that helps the model understand the order and position of words in long sequences without needing to add extra position information to each word.
Performance retention over long documents and conversations