Long-Context Encoder Models for Polish Language Understanding

Sławomir Dadas, Rafał Poświata, Marek Kozłowski, Małgorzata Grębowiec, Michał Perełkiewicz et al.|March 12, 2026arXiv

Key Takeaway

Encoder-only models can be extended to handle long documents through positional embedding adaptation and continued pre-training, offering a parameter-efficient alternative to decoder-only LLMs for document understanding tasks.

Summary

This paper introduces Polish language models based on encoder-only architecture that can process documents up to 8192 tokens long—much longer than traditional BERT models. The researchers used a two-stage training approach with positional embedding adaptation and created smaller distilled versions.

architecture efficiency

Key Terms

long-context-handling positional-embedding-adaptation knowledge-distillation bert-style-encoder parameter-efficient-fine-tuning