S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava|March 26, 2026arXiv

Key Takeaway

You can make diffusion-based language models much faster by intelligently deciding when to verify generated tokens, using the same model in two different modes without retraining.

Summary

S2D2 speeds up block-diffusion language models by combining parallel token generation with selective verification steps. The method reuses the same pretrained model in two modes—as a fast parallel generator and as a careful single-token verifier—without requiring additional training, achieving up to 4.7× speedup over standard autoregressive decoding.

efficiency reasoning

Key Terms

speculative-decoding block-diffusion-language-model routing-policy confidence-thresholding