Hierarchical Reinforcement Learning

Hierarchical
Reinforcement Learning

David Jardim & Luís Nunes
ISCTE-IUL 2009/2010

Outline 1/2
Planning Process
The Problem and Motivation
Markov Decision Process
Q-Learning
Hierarchical Reinforcement Learning
Why HRL?
Approaches
3

Outline 2/2
Semi-Markov Decision Process
Options
Until Now
Next Step - Simbad
Limitations of HRL
Future Work on HRL
Questions
References

4

The Problem and

LEGO_Mindstorms_NXT_mini.jpg
@ http:/
Motivation

/lambcutlet.org/images/
LEGO MindStorms Robot with sensors,
actuators and noise

Purpose of collecting “bricks” and assembly
them accordingly to a plan

Decompose the global problem in sub-
problems

Try to solve the problem by implementing
well-known RL and HRL techniques

6


Computational
approach to learning @ R. S. Sutton, Reinforcement Learning: An Introduction
(MIT Press, 1998).

An agent tries to maximize the reward he receives
when an action is taken

Interacts with a complex, uncertain environment

Learns how to map situations to actions

7

Markov Decision Process

A finite MDP is defined by

a finite set of states S

a finite set of actions A

@ http://en.wikipedia.org/wiki/Markov_decision_process

8

Q-Learning
[Watkins, C.J.C.H.’89]

Agent with a state set S and action set A.

Performs an action a in order to change its
state.

A reward is provided by the environment.

The goal of the agent is to maximize its
total reward.

@ http://en.wikipedia.org/wiki/Q-learning
9

Why HRL?
Improve the performance

Impossibility to apply RL to problems with
large state/action (curse of dimensionality)

Sub-goals and abstract actions can be used
on different tasks (state abstraction)

Multiple levels of temporal abstraction

Obtain state abstraction

10

Approaches
HAMs - Hierarchies of Abstract Machines (Parr
& Russell, 98)

Options - Between MDPs and Semi-MDPs:
Learning, Planning, and Representing Knowledge
at Multiple Temporal Scales (Sutton, Precup &
Singh, 99)

MAXQ Value Function Decomposition (Dietterich,
2000)

Discovering Hierarchy in RL with HEXQ (Hengst,
2002)
11

Semi-Markov Decision
Process
An SMDP consists of

A set of states S

A set of actions A

An expected cumulative discounted reward

A well-deﬁned joint distribution of the
next state and transit time

12

Options
[Sutton, Precup & Singh’99]

An Option is deﬁned by

A policy ∏: SxA ➞ [0,1]

A termination condition β: S^+ →[0,1]

And an initiation set I⊆S

Its hierarchical and used to reach sub-goals

13

Until Now

O1

O2

14

Until Now
St

Steps Steps

Episodes Episodes

@ Sutton, Precup & Singh’99 @ My Simulation

15

Next Step - Simbad
Java 3D Robot Simulator

3D visualization and sensing

Range Sensor: sonars and IR

Contact Sensor: bumpers @ http://simbad.sourceforge.net/

Will allow us to simulate and learn ﬁrst, and then
transfer the learning to our LEGO MindStorm

16

Limitations of HRL

Effectiveness of these ideas on large and
complex continuous control tasks

Sub-goals are assigned manually

Some of the existing algorithms only work
well for the problem which they were
designed to solve

17

Future Work on HRL

Automated discovery of state abstraction

Find the best automated way to discovery
sub-goals to associate with Options

Obtain a long lived learning agent that faces
a continued series of tasks and keep evolving

18

References

R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998).

R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in
Neural Information Processing Systems: Proceedings of the 1997 Conference, Cambridge,
MA, 1998. MIT Press.

R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for
temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, 112:181–211, 1999.

T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function
decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000.

B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In Maching Learning:
Proceedings of the Nineteenth International Conference on Machine Learning, 2002.

20

Hierarchical Reinforcement Learning

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Hierarchical Reinforcement Learning (20)

Recently uploaded (20)

Hierarchical Reinforcement Learning

Editor's Notes