Confidence-Driven Reinforcement Learning — Glossary — ThinkLLM