A technique that reduces a model's size and memory requirements by using lower-precision numbers, enabling it to run on resource-limited devices.