A technique where a smaller model quickly drafts multiple token predictions ahead of time, which a larger model then verifies, reducing the total time needed to generate text.