Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Qi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa|March 20, 2026arXiv

Key Takeaway

You can measure LLM uncertainty efficiently with just one forward pass by clustering semantically similar tokens, avoiding the computational cost of sampling-based or auxiliary model approaches.

Summary

This paper proposes Semantic Token Clustering (STC), a fast method to measure how confident an LLM should be in its answers. Instead of running the model multiple times or using extra models, STC groups similar tokens together and checks if the model's top prediction comes from a coherent semantic cluster. It works in a single pass and catches cases where models are overconfident.

efficiency evaluation

Key Terms

uncertainty-quantification semantic-clustering embedding-clustering prefix-matching overconfidence