Reward Modeling — Glossary — ThinkLLM