You can sample from diffusion models much faster by combining predictions from small and large networks—the method achieves the same accuracy as running the largest network once, instead of many times.
This paper speeds up diffusion model sampling by using multiple neural networks of different sizes together. Instead of running one large network many times, the method runs a small fast network many times and a large accurate network just a few times, reducing total computation while maintaining quality. Tests show up to 4x speedup on image generation.