Group Relative Policy Optimization — Glossary — ThinkLLM