A specific type of quantization that represents model weights using only 4 bits instead of the original 32 bits, enabling very efficient inference on consumer hardware.