This equation describes the expected reward for taking the action prescribed by some policy pi .
The equation for the optimal policy is referred to as the Bellman optimality equation:
V^{*}(s)=max _{a}R(s,a)+gamma sum _{s'}P(s'|s,a)V^{*}(s').
It describes the reward for taking the action giving the highest expected return.