Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

Jello Zhou, Vudtiwat Ngampruetikorn, David J. Schwab|March 17, 2026arXiv

Key Takeaway

Stochastic resetting—randomly restarting an agent during training—accelerates learning convergence by truncating long, uninformative trajectories, offering a simple tuning mechanism for RL that works independently of reward structure.

Summary

This paper shows that periodically resetting an agent back to a starting state during training speeds up reinforcement learning. Unlike traditional methods that slow down learning, resetting helps by cutting short unproductive exploration paths and improving how value estimates propagate through the network, especially when rewards are sparse.

training

Key Terms

policy-convergence stochastic-resetting value-propagation sparse-rewards first-passage-time