Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning — ThinkLLM