Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing

Raghavv Goel, Mukul Gagrani, Mingu Lee, Chris Lott|March 18, 2026arXiv

Key Takeaway

You can make LLMs generate text faster by predicting multiple tokens simultaneously using a training-free probing technique—no model modifications or extra models needed.

Summary

This paper shows that LLMs can predict multiple future tokens at once without retraining, by using special "mask tokens" to probe the model's internal representations. The approach generates candidate tokens in parallel and verifies them together, speeding up text generation by 15-19% while maintaining quality.

efficiency

Key Terms

multi-token-prediction speculative-decoding embedding-space token-throughput