A policy gradient method for SMDPs with application to call admission control
Classical methods for solving a semi-Markov decision process such as value iteration and
policy iteration require precise knowledge of the underlying probabilistic model and are
know to suffer from the curse of dimensionality. To overcome both these limitations, this
paper presents a reinforcement learning approach where one optimizes directly the
performance criterion with respect to a family of parameterised policies. We propose an
online algorithm that simultaneously estimates the gradient of the performance criterion and …
policy iteration require precise knowledge of the underlying probabilistic model and are
know to suffer from the curse of dimensionality. To overcome both these limitations, this
paper presents a reinforcement learning approach where one optimizes directly the
performance criterion with respect to a family of parameterised policies. We propose an
online algorithm that simultaneously estimates the gradient of the performance criterion and …
[PDF][PDF] A Policy Gradient Method for SMDPs with Application to Call Admission Control
A Doucet - researchgate.net
Classical methods for solving a semi-Markov decision process such as value iteration and
policy iteration require precise knowledge of the underlying probabilistic model and are
known to su er from the curse of dimensionality. To overcome both these limitations, this
paper presents a reinforcement learning approach where one optimizes directly the
performance criterion with respect to a family of parameterised policies. We propose an
online algorithm that simultaneously estimates the gradient of the performance criterion and …
policy iteration require precise knowledge of the underlying probabilistic model and are
known to su er from the curse of dimensionality. To overcome both these limitations, this
paper presents a reinforcement learning approach where one optimizes directly the
performance criterion with respect to a family of parameterised policies. We propose an
online algorithm that simultaneously estimates the gradient of the performance criterion and …
Showing the best results for this search. See all results