An optimization technique that reduces memory usage and speeds up inference by having multiple query heads share the same key and value heads instead of each having their own.