由于组里新同学进来,需要带着他入门RL,选择从silver的课程开始。
对于我自己,增加一个仔细阅读《reinforcement learning:an introduction》的要求。
因为之前读的不太认真,这一次希望可以认真一点,将对应的知识点也做一个简单总结。
7.1 n-step TD Prediction
The methods that use n-step backups are still TD methods because theystill change an earlier estimate based on how it differs from a later estimate.
n-step return:
If t +n ≥ T(if the n-step return extends to or beyond termination), then a