You can make diffusion-based language models much faster by intelligently deciding when to verify generated tokens, using the same model in two different modes without retraining.
S2D2 speeds up block-diffusion language models by combining parallel token generation with selective verification steps. The method reuses the same pretrained model in two modes—as a fast parallel generator and as a careful single-token verifier—without requiring additional training, achieving up to 4.7× speedup over standard autoregressive decoding.