IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Zhoujun Cheng, Yutao Xie, Yuxiao Qu, Amrith Setlur, Shibo Hao et al.|March 12, 2026arXiv

Key Takeaway

When doing RL training on LLMs, increase parallel rollouts per problem as your compute budget grows, but expect diminishing returns; this single principle helps you allocate compute efficiently across sampling and training.

Summary

This paper studies how to optimally distribute computing resources when training language models with reinforcement learning. The researchers found that the number of parallel attempts per problem should increase with total compute budget before leveling off, and this pattern holds whether problems are easy or hard—though for different reasons.

scaling training

Key Terms

reinforcement-learning on-policy-rl scaling-laws parallel-rollouts compute-optimal