Because the agent does not directly observe the environment's state, the agent must make decisions under uncertainty of the true environment state.
However, by interacting with the environment and receiving observations, the agent may update its belief in the true state by updating the probability distribution of the current state.
A consequence of this property is that the optimal behavior may often include information gathering actions that are taken purely because they improve the agent's estimate of the current state, thereby allowing it to make better decisions in the future.
It is instructive to compare the above definition with the definition of a Markov decision process. An MDP does not include the observation set, because the agent always knows with certainty the environment's current state. Alternatively, an MDP can be reformulated as a POMDP by setting the observation set to be equal to the set of states and defining the observation conditional probabilities to deterministically select the observation that corresponds to the true state.
Because the agent does not directly observe the environment's state, the agent must make decisions under uncertainty of the true environment state. However, by interacting with the environment and receiving observations, the agent may update its belief in the true state by updating the probability distribution of the current state. A consequence of this property is that the optimal behavior may often include information gathering actions that are taken purely because they improve the agent's estimate of the current state, thereby allowing it to make better decisions in the future.It is instructive to compare the above definition with the definition of a Markov decision process. An MDP does not include the observation set, because the agent always knows with certainty the environment's current state. Alternatively, an MDP can be reformulated as a POMDP by setting the observation set to be equal to the set of states and defining the observation conditional probabilities to deterministically select the observation that corresponds to the true state.
正在翻譯中..
![](//zhcntimg.ilovetranslation.com/pic/loading_3.gif?v=b9814dd30c1d7c59_8619)