When doing RL training on LLMs, increase parallel rollouts per problem as your compute budget grows, but expect diminishing returns; this single principle helps you allocate compute efficiently across sampling and training.
This paper studies how to optimally distribute computing resources when training language models with reinforcement learning. The researchers found that the number of parallel attempts per problem should increase with total compute budget before leveling off, and this pattern holds whether problems are easy or hard—though for different reasons.