Learning to predict probability distributions over outputs rather than single deterministic predictions.