Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi et al.|March 12, 2026arXiv

Key Takeaway

Feature-matching fine-tuning provides a middle ground between simple token prediction and complex reinforcement learning—it gives dense semantic feedback without needing task-specific reward models, making it practical for improving model behavior on real tasks.

Summary

This paper proposes a new way to fine-tune language models by matching learned feature representations instead of predicting individual tokens. Rather than using reinforcement learning with reward models, the method generates multiple model outputs in parallel and uses their semantic features to guide training, achieving better results than standard fine-tuning on coding and translation tasks.

training efficiency reasoning

Key Terms

fine-tuning reinforcement-learning-from-human-feedback policy-gradient semantic-representation teacher-forcing