Think
LLM
Models
Capabilities
Use Cases
Benchmarks
Papers
Glossary
Search
/
Glossary
/
Policy Gradient
Policy Gradient
techniques
Optimization method that updates model parameters by following the gradient of expected rewards.
Policy Gradient — Glossary — ThinkLLM