You can measure LLM uncertainty efficiently with just one forward pass by clustering semantically similar tokens, avoiding the computational cost of sampling-based or auxiliary model approaches.
This paper proposes Semantic Token Clustering (STC), a fast method to measure how confident an LLM should be in its answers. Instead of running the model multiple times or using extra models, STC groups similar tokens together and checks if the model's top prediction comes from a coherent semantic cluster. It works in a single pass and catches cases where models are overconfident.