The balance between inference cost (compute, latency) and answer quality that systems must optimize for.