Because the agent does not directly

Because the agent does not directly observe the environment's state, the agent must make decisions under uncertainty of the true environment state.
However, by interacting with the environment and receiving observations, the agent may update its belief in the true state by updating the probability distribution of the current state.
A consequence of this property is that the optimal behavior may often include information gathering actions that are taken purely because they improve the agent's estimate of the current state, thereby allowing it to make better decisions in the future.

It is instructive to compare the above definition with the definition of a Markov decision process. An MDP does not include the observation set, because the agent always knows with certainty the environment's current state. Alternatively, an MDP can be reformulated as a POMDP by setting the observation set to be equal to the set of states and defining the observation conditional probabilities to deterministically select the observation that corresponds to the true state.

It is instructive to compare the above definition with the definition of a Markov decision process. An MDP does not include the observation set, because the agent always knows with certainty the environment's current state. Alternatively, an MDP can be reformulated as a POMDP by setting the observation set to be equal to the set of states and defining the observation conditional probabilities to deterministically select the observation that corresponds to the true state.

0/5000

原始語言: -

目標語言: -

結果 (繁體中文) 1: [復制]

復制成功！

Because the agent does not directly observe the environment's state, the agent must make decisions under uncertainty of the true environment state. However, by interacting with the environment and receiving observations, the agent may update its belief in the true state by updating the probability distribution of the current state. A consequence of this property is that the optimal behavior may often include information gathering actions that are taken purely because they improve the agent's estimate of the current state, thereby allowing it to make better decisions in the future.It is instructive to compare the above definition with the definition of a Markov decision process. An MDP does not include the observation set, because the agent always knows with certainty the environment's current state. Alternatively, an MDP can be reformulated as a POMDP by setting the observation set to be equal to the set of states and defining the observation conditional probabilities to deterministically select the observation that corresponds to the true state.

正在翻譯中..

結果 (繁體中文) 2:[復制]

復制成功！

由於代理不直接觀測環境的狀態下，代理人必須在真實的環境狀態的不確定性做出的決定。
但是，通過與環境的交互和接收的觀測，代理可以通過更新概率分佈更新它的真實狀態的信仰目前的狀態。
這個屬性的一個後果是，最優行為往往包括信息收集是採取純粹是因為它們提高的當前狀態的代理的估計，從而使其能夠在未來做出更好的決策行動。這是有益的上面的定義與一個Markov決策過程的定義進行比較。MDP中不包括觀測集，因為代理總是肯定的環境的當前狀態知道。可選地，MDP可以通過設置觀察設定為等於該組狀態和限定所述觀測條件概率確定性地選擇對應於所述真實狀態的觀察來重新作為POMDP。

正在翻譯中..

結果 (繁體中文) 3:[復制]

復制成功！

因為代理不直接觀察環境的狀態，代理必須在不確定的環境狀態下做出决定。然而，通過與環境和接受觀察，代理可以更新其信仰的真實狀態，通過更新的概率分佈的當前狀態。
此内容的結果是，最佳的行為可能會經常包括資訊收集行動，採取純粹是因為他們提高了代理的估計，現時的狀態，從而使其作出更好的决定，在未來。MDP不包括觀測集，因為代理總是知道環境的當前狀態。另外，MDP可以重新設定設定為等於狀態的設定和定義的觀察條件概率來確定選擇對應於真實狀態的觀察觀察POMDP。

正在翻譯中..

其它語言

本翻譯工具支援: 世界語, 中文, 丹麥文, 亞塞拜然文, 亞美尼亞文, 伊博文, 俄文, 保加利亞文, 信德文, 偵測語言, 優魯巴文, 克林貢語, 克羅埃西亞文, 冰島文, 加泰羅尼亞文, 加里西亞文, 匈牙利文, 南非柯薩文, 南非祖魯文, 卡納達文, 印尼巽他文, 印尼文, 印度古哈拉地文, 印度文, 吉爾吉斯文, 哈薩克文, 喬治亞文, 土庫曼文, 土耳其文, 塔吉克文, 塞爾維亞文, 夏威夷文, 奇切瓦文, 威爾斯文, 孟加拉文, 宿霧文, 寮文, 尼泊爾文, 巴斯克文, 布爾文, 希伯來文, 希臘文, 帕施圖文, 庫德文, 弗利然文, 德文, 意第緒文, 愛沙尼亞文, 愛爾蘭文, 拉丁文, 拉脫維亞文, 挪威文, 捷克文, 斯洛伐克文, 斯洛維尼亞文, 斯瓦希里文, 旁遮普文, 日文, 歐利亞文 (奧里雅文), 毛利文, 法文, 波士尼亞文, 波斯文, 波蘭文, 泰文, 泰盧固文, 泰米爾文, 海地克里奧文, 烏克蘭文, 烏爾都文, 烏茲別克文, 爪哇文, 瑞典文, 瑟索托文, 白俄羅斯文, 盧安達文, 盧森堡文, 科西嘉文, 立陶宛文, 索馬里文, 紹納文, 維吾爾文, 緬甸文, 繁體中文, 羅馬尼亞文, 義大利文, 芬蘭文, 苗文, 英文, 荷蘭文, 菲律賓文, 葡萄牙文, 蒙古文, 薩摩亞文, 蘇格蘭的蓋爾文, 西班牙文, 豪沙文, 越南文, 錫蘭文, 阿姆哈拉文, 阿拉伯文, 阿爾巴尼亞文, 韃靼文, 韓文, 馬來文, 馬其頓文, 馬拉加斯文, 馬拉地文, 馬拉雅拉姆文, 馬耳他文, 高棉文, 等語言的翻譯.