Connecting representations from different types of data (like speech and text) so they work together effectively.