Models Capabilities Use Cases Benchmarks Papers Glossary

Models Capabilities Use Cases Benchmarks Papers Glossary

About Privacy Terms RSS

ThinkLLM

Spot an error in our data? Let us know.

Glossary/Grouped-Query Attention

Grouped-Query Attention

architecture

An optimization technique that reduces memory usage and speeds up inference by having multiple query heads share the same key and value heads instead of each having their own.

Learn more on Wikipedia

Grouped-Query Attention — Glossary — ThinkLLM