Using specialized experts for different modalities (speech vs.
This paper presents MiSTER-E, a system that recognizes emotions in conversations by combining speech and text information. It uses separate AI experts for speech, text, and cross-modal analysis, then intelligently combines their predictions. The system works on real conversations without needing to know who's speaking, and achieves strong results on standard emotion recognition benchmarks.