Do Metrics for Counterfactual Explanations Align with User Perception?

Felix Liedeker, Basil Ell, Philipp Cimiano, Christoph Düsing|March 16, 2026arXiv

Key Takeaway

Standard metrics for evaluating counterfactual explanations don't align with human judgment—developers need human-centered evaluation methods, not just algorithmic scores, to build truly trustworthy AI systems.

Summary

This study compares how AI systems measure counterfactual explanations (showing what would need to change for a different prediction) against how humans actually judge them. Researchers found that standard algorithmic metrics poorly predict human satisfaction, suggesting current evaluation methods miss what users actually care about in explanations.

evaluation safety alignment

Key Terms

counterfactual-explanation explainability evaluation-metric