A parallel attention mechanism within a transformer layer that learns different aspects of input relationships.