Only relative ranks matter in weight-clustered large language models

Borja Aizpurua, Sukhbinder Singh, Román Orús|March 18, 2026arXiv

Key Takeaway

LLM weights can be compressed to just 16-64 unique values per matrix without retraining by preserving relative rank order, enabling simple disk compression and revealing that rank structure—not magnitude—is what drives model behavior.

Summary

This paper shows that LLMs don't need exact weight values—only the relative ordering of weights matters. By clustering weights into 16-64 shared values per matrix, the authors compress models like Llama 3.1-8B without retraining. They prove this by scrambling weight values while preserving rank order, finding that rank matters far more than precise magnitudes for model performance.

efficiency evaluation

Key Terms

weight-clustering model-compression quantization rank-order