Think
LLM
Models
Capabilities
Use Cases
Benchmarks
Papers
Glossary
Search
/
Glossary
/
On-Policy RL
On-Policy RL
techniques
Reinforcement learning where the model learns from data generated by its own current policy.
On-Policy RL — Glossary — ThinkLLM