On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or|March 30, 2026arXiv

Key Takeaway

You can increase diversity in generated images by applying repulsion forces in the transformer's attention channels during generation, without expensive optimization or visual artifacts.

Summary

This paper tackles the problem of text-to-image diffusion models producing visually similar outputs for the same prompt. The authors propose a method that applies 'repulsion' in the attention mechanism during image generation to encourage diverse outputs while maintaining quality and semantic accuracy.

architecture efficiency multimodal

Key Terms

diffusion-transformer multimodal-attention typicality-bias contextual-space