Direct Preference Optimization — Glossary — ThinkLLM