Reducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.