Evaluating model outputs by ranking them on an ordered scale rather than binary correct/incorrect judgments.