The process of updating an agent's estimates of state values backward through a trajectory during learning.