A quantization format where model weights are stored in 4-bit precision while calculations use 16-bit precision, balancing efficiency with accuracy.