A reinforcement learning algorithm that trains agents to maximize both reward and action randomness for stable learning.