A training method where a model learns by receiving rewards or penalties for its outputs, encouraging it to improve its behavior over time.