Using a language model to automatically evaluate the quality of outputs from other AI systems instead of human reviewers.