Reinforcement Learning

Reinforcement Learning
Yigit UNALLAR

Machine Learning
Learn without explicitly programmed!
● Supervised Learning
● Unsupervised Learning
● Reinforcement Learning

Reinforcement Learning
● Learning from interaction!
○ Driving a car,
○ Holding a conversation,
● Goal-directed approach
○ Closed-loop,
○ Reward oriented,

Reinforcement vs. Unsupervised Learning
● Hidden structures!
● Unlabeled data!
● No reliance on structures!
● Maximize a reward!

Exploration vs. Exploitation Dilemma
● Exploit to obtain rewards!
● Explore to perform better!
● Either Exploration or Exploitation?
● Closest to the human and animal learning!

Examples
● Mobile Robot
○ More trash to find,
○ Way back to battery station,
● Adaptive Controller for Petrol Refinery
○ Optimize yield/cost/quality,
○ Specified marginal costs,

Agent & Environment
● Policy,
○ Mapping from states to actions,
● Reward,
○ Pain, pleasure,
● Value Function,
○ Farsighted judgement,
● Model,
○ Mimics the environment,

Pick and Place Robot
Action:
Voltages at motors,
States:
Latest joint data,
Reward:
+1 for successful pick-up, computed in the environment!

Goals & Markov Decision Process
Goals:
Markov Decision Process:
Retaining all relevant information, Markov Property!

Markov Decision Process ctd.
MDP if,
● The state and action spaces are finite,
● Satisfies Markov property,
Example: Recycling Robot
● Actively search for a can,
● Remain still and wait for a can,
● Go back to station,

Value Functions- Bellman Equations
Solving RL tasks for WHAT?!
● Finding a policy
○ Achieves lots of reward
■ Over the long RUN!

Dynamic Programming
● Use value functions,
● Organize and structure a search,
● GOOD POLICIES!

Monte Carlo Methods
● Used in algorithm to mimic policy iteration,
○ Policy Evaluation,
■ (s,a) averages over time ==> Q
○ Policy Iteration,
■ Next policy from Q, (Greedy Policy),
● Given s, new policy returns a that max Q(s, . )
● Works in episodic problems ONLY!

References
[1] Reinforcement Learning: Introduction, R. Sutton, A. Barto
[2] AIMA, S. Russell, P. Norvig

Reinforcement Learning

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Reinforcement Learning (20)

Recently uploaded (20)

Reinforcement Learning