《reinforcement learning：an introduction》第八章《Planning and Learning with Tabular Methods》总结

mmc2015

于 2017-08-03 11:03:27 发布

阅读量1.7k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：（深度）增强学习文章标签：增强学习 sutton RL reinforcement learni an introduction

本文链接：https://blog.csdn.net/mmc2015/article/details/76608559

本文是对《reinforcement learning：an introduction》第八章的总结，探讨了模型与规划、Dyna-Q、模型错误时的影响及优先清扫策略。重点讲解了如何在有限的体验中建立模型，通过模拟来改善政策，以及当模型不准确时，如何处理探索与利用之间的权衡问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

由于组里新同学进来，需要带着他入门RL，选择从silver的课程开始。

对于我自己，增加一个仔细阅读《reinforcement learning：an introduction》的要求。

因为之前读的不太认真，这一次希望可以认真一点，将对应的知识点也做一个简单总结。

8.1 Models and Planning

By a model of the environment we mean anything that an agent can use to predict how the environment will respond to its actions

The word planning is used in several different ways in different fields. We use the term to refer to any computational process that takes a model as input and produces or improves a policy for interacting with the modeled environment

The difference is that whereas planning uses simulated experience generated by a model, learning methods use real experience generated by the environment. Of course this difference lead