Google Scholar

Renewal Monte Carlo: Renewal theory-based reinforcement learning

J Subramanian, A Mahajan - IEEE Transactions on Automatic …, 2019 - ieeexplore.ieee.org

IEEE Transactions on Automatic Control, 2019•ieeexplore.ieee.org

An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented. RMC works for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo-viz., simplicity, ease of implementation, and low bias-while circumventing the main drawbacks of Monte Carlo- viz., high variance and delayed updates. Given a parameterized policy _πθ, the algorithm consists of three parts: estimating the expected discounted reward R_θ and the expected discounted time T_θ over a regenerative cycle; estimating the derivatives Δ_θR_θ and Δ_θT_θ; and updating the policy parameters using stochastic approximation to find the roots of R_θΔ_θT_θ - T_θΔ_θR_θ. It is shown that under mild technical conditions, RMC converges to a locally optimal policy. It is also shown that RMC works for postdecision state models as well. An approximate version of RMC is proposed where a regenerative cycle is defined as successive visits to a prespecified “renewal set”. It is shown that if the value function of the system is locally Lipschitz on the renewal set, then RMC converges to an approximate locally optimal policy. Three numerical experiments are presented to illustrate RMC and compare it with other state-of-the-art reinforcement learning algorithms.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 19 Related articles All 12 versions Library Search

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Renewal Monte Carlo: Renewal theory-based reinforcement learning