The number of tokens (small units of text) consumed during model inference; higher token usage means more computational cost and longer response times.