Self-Distillation Policy Optimization (SDPO) — Glossary — ThinkLLM