A technique that compresses a large, complex model into a smaller one by training it to mimic the larger model's behavior, resulting in faster inference with minimal loss of quality.