Summary

Your Name

Instructor Name

Course Number

Date

Summary

Introduction

The study done to know the impact of reward and penalty on the actions of animals and robotics is termed as Reinforcement learning ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"dtSqHFHm","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). To familiarize the dog with the desired technique, he is rewarded if it tries to learn the technique. This influences the dog to perform with dedication. The artificial systems are known to behave in a similar way when rewarded or punished ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"qU7xyzC9","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). Among the given actions, the algorithm designed for reinforcement learning selects the action that has a maximum potential to generate a reward at every step for a required task. The action is then analyzed based upon a reward and a new update is generated that incorporates the information. This association between animal behavior and neuroscience presents reinforcement learning as a phenomenal machine learning model ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"qAphkkOL","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). For the development of artificial systems, efficient demonstration based on the high dimensional input is needed which incorporates evidence from the previous practices. The paper usually provides the necessary high dimensional inputs via a proposed network called as deep Q-network. The subject of test was in the Atari game 2600 which incorporates two inputs one of which is the points on the game and other are pixels presented on the screen. As a result of this study, DQN is shown to produce phenomenal results regarding the given task.

Method

The network was conditioned on 49 Atari games and tested on a maximum of Fifty million pixels per each game (one that they claim is around 38 days of gaming experience) ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"2DY4p2yV","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). The network as a whole was configured as follows:

Input layer consisting of 84×84×484×84×4 image.

Layer 1 consisting of 32 filters of 8×88×8 with stride 4 followed by a rectifier nonlinearity, defined as max (0, x).

Layer 2 consisting of 64 filters of 4×44×4 with stride 2 followed by the rectifier nonlinearity.

Layer 3 consisting of 64 filters of 3×33×3 with stride 1 followed by the rectifier nonlinearity.

Layer 4 consisting of fully connected with 512 rectifier units.

Output layer consisting of fully connected with linear units, each per action.

Algorithm

The study aims at establishing a network that is required to surpass the number of difficult learning tasks. For this purpose and application of the DQN network is used which contributes towards combining reinforcement learning and deep neural networks ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"E0eGAjhj","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). Specifically, the Ideal action-value function is determined by using a deep convolutional network. The structure of the network is shown in figure 1 below.

Q∗(s,a)=maxπE[rt+γrt+1+γ2rt+2+…|st=s,at=a,π]

It is the total cumulative number of rewards rt deducted by γ at every step of the time t. The deducted factor is denoted by γ∈ [0, 1]. Q- Learning updates are applied to the stored samples. The following loss functions give the Q-learning update.

Li(θi)=E(s,a,r,s′)∼U(D)[(r+γmaxa′Q(s′,a′;θ−i)−Q(s,a;θi))2]

6993006427300

Fig. 1. Diagram representing the convolutional neural network

Results

Every game was played about Thirty times by specialized networks, equal to at least 5 minutes. The randomized agent chose a specific event every 6 samples which are also the standard. The human opponent utilized the same simulator as the agents and played under controlled circumstances. The player output is the cumulative reward of about 20 game events running for 5 minutes, following 2 hours of training playing consistently ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"LXnEwBnT","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). The individual efficiency is calculated to be 100%, and the performance of the randomized agent is adjusted to 0%.

The method was evaluated against the best performance methodologies and the professional game evaluator. A total of forty-nine Atari games was evaluated that were known to perform a wide range of actions. More than half of the games were shown to accomplish a score of 75 percent while the DQN agent surpassed the past methodologies by a significant margin for 43 games ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"NS0LuizW","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). DQN operates well in a variety of gameplay modes. Games featuring prolonged strategy techniques. However, still present a major problem for DQN (e.g. Revenge from Montezuma) ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"bIl7qeed","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). These findings are shown in the following figure 2.

center188400

Fig. 2. The contrast of a DQN agent with the best approaches for reinforcement learning

The investigators also discussed the interpretation acquired by DQN that utilized a methodology established to simulate high-dimensional content called' t-SNE' to support the most productive agent in the game Space Raiders ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"AWoakWNh","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). The t-SNE theorem, as anticipated, appears to project the DQN interpretation to adjacent sites of spatially similar areas. It has been revealed that t-SNE generated identical encoding for DQN representations which were comparable in terms with predicted reward although different in conceptual context ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"eYAfOyXn","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). That's compatible with the concept that perhaps the network will benefit from high-dimensional perceptual input representations that endorse responsive activity.

Conclusion

The constructed implementation has shown the capacity to learn about playing games from Atari, provided limited game previous knowledge as well as fundamental variables ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"3axA3c11","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). Using reinforcement learning with the Q-network algorithm was much more efficient than numerous previous efforts, as training had been using encounter preview as well as a distinct designated network. These two adjustments eliminated correlations among concurrent inputs, that enhanced network efficiency ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"S2SvKlw4","properties":{"formattedCitation":"(Mnih et al.)","plainCitation":"(Mnih et al.)","noteIndex":0},"citationItems":[{"id":64,"uris":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"uri":["http://zotero.org/users/local/LGdpQbDd/items/DUFHFGK5"],"itemData":{"id":64,"type":"article-journal","title":"Human-level control through deep reinforcement learning","container-title":"Nature","page":"529-533","volume":"518","issue":"7540","source":"www.nature.com","abstract":"The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6,7,8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9,10,11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.","DOI":"10.1038/nature14236","ISSN":"1476-4687","language":"en","author":[{"family":"Mnih","given":"Volodymyr"},{"family":"Kavukcuoglu","given":"Koray"},{"family":"Silver","given":"David"},{"family":"Rusu","given":"Andrei A."},{"family":"Veness","given":"Joel"},{"family":"Bellemare","given":"Marc G."},{"family":"Graves","given":"Alex"},{"family":"Riedmiller","given":"Martin"},{"family":"Fidjeland","given":"Andreas K."},{"family":"Ostrovski","given":"Georg"},{"family":"Petersen","given":"Stig"},{"family":"Beattie","given":"Charles"},{"family":"Sadik","given":"Amir"},{"family":"Antonoglou","given":"Ioannis"},{"family":"King","given":"Helen"},{"family":"Kumaran","given":"Dharshan"},{"family":"Wierstra","given":"Daan"},{"family":"Legg","given":"Shane"},{"family":"Hassabis","given":"Demis"}],"issued":{"date-parts":[["2015",2]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} (Mnih et al.). Additional research must be conducted to develop the experience review algorithm: the selection must be skewed to big-reward incidents rather than selecting randomly from the simulation database. It may add an element of disruption to the network, but it deserves to be examined.

Works Cited

ADDIN ZOTERO_BIBL {"uncited":[],"omitted":[],"custom":[]} CSL_BIBLIOGRAPHY Mnih, Volodymyr, et al. “Human-Level Control through Deep Reinforcement Learning.” Nature, vol. 518, no. 7540, Feb. 2015, pp. 529–33. www.nature.com, doi:10.1038/nature14236.

More Subjects

Education

(5614)

English

(2294)

Business And Management

(1760)

Healthcare And Nursing

(1684)

Psychology

(929)

History And Anthropology