LLM weights can be compressed to just 16-64 unique values per matrix without retraining by preserving relative rank order, enabling simple disk compression and revealing that rank structure—not magnitude—is what drives model behavior.
This paper shows that LLMs don't need exact weight values—only the relative ordering of weights matters. By clustering weights into 16-64 shared values per matrix, the authors compress models like Llama 3.1-8B without retraining. They prove this by scrambling weight values while preserving rank order, finding that rank matters far more than precise magnitudes for model performance.