Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration

Zakaria Mhammedi, James Cohan|March 23, 2026arXiv

Key Takeaway

Separating exploration from policy optimization using uncertainty-guided tree search is dramatically more efficient than standard RL approaches for hard exploration problems, and discovered trajectories can be converted into deployable policies afterward.

Summary

This paper proposes a new approach to exploration in reinforcement learning that separates the exploration phase from policy optimization. Instead of using RL with intrinsic motivation rewards, the method uses tree search guided by uncertainty estimates to efficiently discover new states, then distills the discovered trajectories into policies.

reasoning

Key Terms

intrinsic-motivation epistemic-uncertainty tree-search policy-distillation sparse-reward