Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

Connor Mclaughlin, Nigel Lee, Lili Su|March 24, 2026arXiv

Key Takeaway

When deploying models that learn from new tasks with scarce data, routing samples intelligently based on task similarity prevents negative interference while maximizing knowledge reuse across overlapping tasks.

Summary

This paper tackles continual learning when tasks have limited data and may overlap unpredictably. The authors propose an adaptive mixture-of-experts system that learns which tasks are similar and routes data accordingly, using two key techniques: gradually introducing task-specific prompts over time and identifying which samples fit existing patterns versus need new ones.

efficiency architecture

Key Terms

mixture-of-experts continual-learning negative-knowledge-transfer task-overlap prompt-masking