Using audio examples as conditioning signals instead of text prompts gives you finer control over sound synthesis and avoids the ambiguity problems that come with describing acoustic details in words.
AC-Foley generates realistic sound effects for videos by using reference audio as a guide instead of text descriptions. This solves the problem of text being too vague to describe subtle acoustic details, enabling precise control over sound timbre and quality while supporting zero-shot generation of new sounds.