Tuning a model's design or training to run more efficiently on specific hardware (like NVIDIA GPUs), reducing memory usage and inference time.