Who Guards the Guardians? The Challenges of Evaluating Identifiability of Learned Representations

Shruti Joshi, Théo Saulus, Wieland Brendel, Philippe Brouillard, Dhanya Sridhar et al.|February 27, 2026arXiv

Key Takeaway

Standard metrics for evaluating learned representations are often misspecified and can mislead you about whether your model actually learned interp...

Summary

This paper reveals that popular metrics for checking if AI models learn meaningful, interpretable features are unreliable. The metrics work only under specific conditions, and when those conditions aren't met, they give false results—saying a model learned good features when it didn't, or vice versa. The authors provide tools to properly test these metrics.

evaluation training

Key Terms

identifiability representation-learning metric-misspecification ground-truth-factors