Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation

Vitória Barin Pacela, Shruti Joshi, Isabela Camacho, Simon Lacoste-Julien, David Klindt|March 30, 2026arXiv

Key Takeaway

Sparse autoencoders fail at compositional generalization because they learn poor concept dictionaries during training, not because of their amortized inference approach—fixing dictionary learning, not inference speed, is the key to interpretable AI.

Summary

This paper reveals why sparse autoencoders (SAEs) and linear probes fail to understand compositional concepts in neural networks. The core issue isn't the inference method—it's that SAEs learn dictionaries (concept representations) pointing in the wrong directions.

reasoning evaluation

Key Terms

sparse-autoencoder linear-representation-hypothesis superposition dictionary-learning compositional-generalization