Google Scholar

A policy gradient method for SMDPs with application to call admission control

S Singh, V Tadic, A Doucet - 7th International Conference on …, 2002 - ieeexplore.ieee.org

Classical methods for solving a semi-Markov decision process such as value iteration and
policy iteration require precise knowledge of the underlying probabilistic model and are
know to suffer from the curse of dimensionality. To overcome both these limitations, this
paper presents a reinforcement learning approach where one optimizes directly the
performance criterion with respect to a family of parameterised policies. We propose an
online algorithm that simultaneously estimates the gradient of the performance criterion and …

Save Cite Cited by 5 Related articles

[PDF] researchgate.net

[PDF][PDF] A Policy Gradient Method for SMDPs with Application to Call Admission Control

A Doucet - researchgate.net

Classical methods for solving a semi-Markov decision process such as value iteration and
policy iteration require precise knowledge of the underlying probabilistic model and are
known to su er from the curse of dimensionality. To overcome both these limitations, this
paper presents a reinforcement learning approach where one optimizes directly the
performance criterion with respect to a family of parameterised policies. We propose an
online algorithm that simultaneously estimates the gradient of the performance criterion and …

Showing the best results for this search. See all results

Cite

Advanced search

Saved to My library

A policy gradient method for SMDPs with application to call admission control

[PDF][PDF] A Policy Gradient Method for SMDPs with Application to Call Admission Control