Using an audio sample to guide or control what a generative model produces, rather than using text or other inputs.