Adaptive block-scaled quantization can significantly reduce errors in 4-bit model compression by intelligently switching between data types per block, achieving better accuracy than fixed formats without extra storage cost.
This paper introduces adaptive quantization formats (IF4, IF3, IF6) that improve upon NVFP4 by dynamically choosing between floating-point and integer representations for each block of values. The approach uses an unused bit in NVFP4 to signal which format to use, reducing quantization errors and improving language model performance with minimal hardware overhead.