Renewal Monte Carlo: Renewal theory-based reinforcement learning
J Subramanian, A Mahajan - IEEE Transactions on Automatic …, 2019 - ieeexplore.ieee.org
IEEE Transactions on Automatic Control, 2019•ieeexplore.ieee.org
An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented.
RMC works for infinite horizon Markov decision processes with a designated start state.
RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo-viz.,
simplicity, ease of implementation, and low bias-while circumventing the main drawbacks of
Monte Carlo-viz., high variance and delayed updates. Given a parameterized policy πθ, the
algorithm consists of three parts: estimating the expected discounted reward R θ and the …
RMC works for infinite horizon Markov decision processes with a designated start state.
RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo-viz.,
simplicity, ease of implementation, and low bias-while circumventing the main drawbacks of
Monte Carlo-viz., high variance and delayed updates. Given a parameterized policy πθ, the
algorithm consists of three parts: estimating the expected discounted reward R θ and the …
An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented. RMC works for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo-viz., simplicity, ease of implementation, and low bias-while circumventing the main drawbacks of Monte Carlo- viz., high variance and delayed updates. Given a parameterized policy πθ, the algorithm consists of three parts: estimating the expected discounted reward Rθ and the expected discounted time Tθ over a regenerative cycle; estimating the derivatives ΔθRθ and ΔθTθ; and updating the policy parameters using stochastic approximation to find the roots of RθΔθTθ - TθΔθRθ. It is shown that under mild technical conditions, RMC converges to a locally optimal policy. It is also shown that RMC works for postdecision state models as well. An approximate version of RMC is proposed where a regenerative cycle is defined as successive visits to a prespecified “renewal set”. It is shown that if the value function of the system is locally Lipschitz on the renewal set, then RMC converges to an approximate locally optimal policy. Three numerical experiments are presented to illustrate RMC and compare it with other state-of-the-art reinforcement learning algorithms.
ieeexplore.ieee.org
Showing the best result for this search. See all results