Grouping tokens with similar meanings together to assess whether a model's prediction is semantically coherent.