Online Experiential Learning for Language Models

Tianzhu Ye, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang et al.|March 17, 2026arXiv

Key Takeaway

Language models can improve themselves in production by learning from actual user interactions—extracting knowledge from deployment experience and feeding it back into training without requiring access to the original environment.

Summary

This paper introduces Online Experiential Learning (OEL), a system that lets language models continuously improve by learning from real interactions during deployment. Instead of relying only on offline training data, OEL extracts useful knowledge from user interactions, then updates the model with this knowledge without needing access to the original environment.

training reasoning efficiency

Key Terms

on-policy-rl context-distillation experiential-knowledge trajectory