Automatically creating evaluation criteria and scoring guidelines that judges use to assess output quality.