F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Ziyin Zhang, Zihan Liao, Hang Yu, Peng Di, Rui Wang|March 19, 2026arXiv

Key Takeaway

You can now use smaller, faster embedding models for multilingual search and retrieval without sacrificing quality—F2LLM-v2 offers efficient options for resource-constrained deployments while the largest variant ranks first on major benchmarks.

Summary

F2LLM-v2 is a family of multilingual embedding models (80M to 14B parameters) trained on 60 million high-quality samples that support 200+ languages, including underserved low-resource ones. Using matryoshka learning and knowledge distillation, these models achieve top performance on benchmarks while being more efficient than previous LLM-based embeddings.

multimodal efficiency training

Key Terms

matryoshka-representation-learning knowledge-distillation multilingual-embedding-space model-pruning