The equation for the optimal policy is referred to as the Bellman optimality equation:
V^{*}(s)=max _{a}R(s,a)+gamma sum _{s'}P(s'|s,a)V^{*}(s').
It describes the reward for taking the action giving the highest expected return.
Extremely important and useful recurrence relations.
Can be used to compute the return from a given policy or to compute the optimal return via value iteration
The equation for the optimal policy is referred to as the Bellman optimality equation:
It describes the reward for taking the action giving the highest expected return.