You can make LLMs generate text faster by predicting multiple tokens simultaneously using a training-free probing technique—no model modifications or extra models needed.
This paper shows that LLMs can predict multiple future tokens at once without retraining, by using special "mask tokens" to probe the model's internal representations. The approach generates candidate tokens in parallel and verifies them together, speeding up text generation by 15-19% while maintaining quality.