A quantization technique that compresses model weights to lower precision, reducing file size and memory requirements while maintaining reasonable performance.