You can create smaller datasets that preserve large dataset knowledge using pre-trained diffusion models with geometric guidance—no retraining ne...
This paper introduces ManifoldGD, a method to create smaller, representative datasets from large ones using diffusion models without any training. Instead of simple guidance, it uses geometric manifold structures to ensure generated synthetic data captures both broad concepts and fine details, resulting in better quality distilled datasets with fewer images.