A specific quantization method that compresses both the model's stored weights and its intermediate calculations to 8-bit precision, significantly reducing memory and computation requirements.
Weight and Activation Quantization (W8A8) — Glossary — ThinkLLM