An adversarial attack that simultaneously perturbs multiple input modalities (e.g., text and audio) to fool a model.