Communication-Efficient and Robust Multi-Modal Federated Learning via Latent-Space Consensus

Mohamed Badi, Chaouki Ben Issaid, Mehdi Bennis|March 19, 2026arXiv

Key Takeaway

When building federated systems with multi-modal data, you can align different data types in a shared compressed space using learnable projections, reducing both communication overhead and the need for all devices to use identical architectures.

Summary

This paper presents CoMFed, a federated learning system that lets multiple devices train together on different types of data (like video and audio) without sharing raw information. It uses compressed representations and alignment techniques to handle the challenge of different devices having different data types and model structures, while keeping communication costs low.

multimodal efficiency

Key Terms

federated-learning latent-space multimodal communication-efficiency