Attention mechanism that processes multiple types of input (like text and image features) simultaneously in a transformer.