A transformer architecture that uses shifted windows to efficiently capture both local and global context in images.