SlideShare a Scribd company logo
Tech Talk: Reinforcement Learning
Tamura Yasuto
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Theme of This Tech Talk: Stop Saying “Trial and Errors“
With these charts,
you will miss the point in the beginning
From “Trial and Errors“ to Interactive Value-Policy Updates
Agent
Environment
Action
Reward
Value
Policy
This part should be
emphasized more
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Role of Reinforcement Learnig (RL) in AI
Machine learning
AI
Machine learning
Classical
models
Neural
networks
Supervised
learning
Unsupervised
learning
Reinforcement
learning
Models How to train
Rough Definition of RL: Planning Problem
• Sequential decision making: optimizing a sequence of actions
• Optimizing a “policy”: a “policy” means how to move in a given “state”
• Assuming Markov decision processes: next action only depends on the current state
Policy Action State
Example of planning: navigating a robot
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Markov Decision Process (MDP) in Some Expressions
Agent Env
Action
Reward
• Typical RL diagram
• State transition diagram • Backup diagram (closed)
• Graphical model
MDP: with an Example of Balancing a Bike
Or
State 0
State 1
State 2
State 3
State 4
Leaning left
No move
Leaning right
Plannign in MDP: Some Expressions
• Learning how to move optimally in each state
No move
Lean left
Lean right
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping UP
Values and Policies: with an Example of Balancing a Bike
• Value: how good it is to be in a state
• Policies: a probability of taking an action in a state
State 0:
minus reward
State 1:
low value
State 2:
high value
State 3:
low value
State 4:
minus reward
Action 0:
Low probability
Action 2:
High probability
Action 1
Policy updates
• Higher probability on actions to the direction of high values
State 0:
minus reward
State 1:
low value
State 2:
high value
Action 0:
leaning left
Action 1:
leaning right
Then how can a vlaue be learned?
Giving higher probability
Value update: Temporal Difference (TD) Learning
• TD learning: updating values by filling a gap between expectation
and actual rewards
If you lean left, the
values is low. As expected!
TD loss is low
Leaning right would
not be good because
value is low.
I was wrong.
There is no bad reward.
Let’s update the value.
TD loss is high
Learning could happen without explicit rewards
Interactive Updates of Value and Policy
Value updates (TD learning)
Policy updates
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Wrapping Up
• RL formulation: a planning problem by optimizing a policy
• Simple assumption of MDP : an action only depends on the current state
• Importance of a value: updating a policy by evaluating how good to be in
• TD learning: updating values by filling a gap between estimations on
values and actual rewards

More Related Content

PPTX
Attention Is All You Need
Illia Polosukhin
 
PPTX
NLP_deep_learning_intro.pptx
YasutoTamura1
 
PDF
Matrix calculus
Sungbin Lim
 
PDF
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
PPTX
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
PDF
그림 그리는 AI
NAVER Engineering
 
PDF
Attention
SEMINARGROOT
 
PDF
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Peerasak C.
 
Attention Is All You Need
Illia Polosukhin
 
NLP_deep_learning_intro.pptx
YasutoTamura1
 
Matrix calculus
Sungbin Lim
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
그림 그리는 AI
NAVER Engineering
 
Attention
SEMINARGROOT
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Peerasak C.
 

What's hot (20)

PDF
Helm - Package Manager for Kubernetes
Knoldus Inc.
 
PPTX
Introduction to Helm
Harshal Shah
 
PDF
Conversational AI with Transformer Models
Databricks
 
PDF
BERT Finetuning Webinar Presentation
bhavesh_physics
 
PDF
Tupperware: Containerized Deployment at FB
Docker, Inc.
 
PPTX
Decision Transformer: Reinforcement Learning via Sequence Modeling
Tomoya Oda
 
PDF
PRML Chapter 8 (8.0-8.3)
Shogo Nakamura
 
PDF
Kubernetes Security Best Practices - With tips for the CKS exam
Ahmed AbouZaid
 
PPTX
Transformer xl
San Kim
 
DOCX
Niger National Flag
SH Rajøn
 
PDF
[DL輪読会]Multi-Agent Cooperation and the Emergence of (Natural) Language
Deep Learning JP
 
PDF
오토인코더의 모든 것
NAVER Engineering
 
PDF
Model-Based Reinforcement Learning @NIPS2017
mooopan
 
PDF
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Vietnam Open Infrastructure User Group
 
PPTX
Intro to Helm for Kubernetes
Carlos E. Salazar
 
PPTX
Recurrent neural network
Syed Annus Ali SHah
 
PDF
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
 
PPTX
Kubernetes 101 for Beginners
Oktay Esgul
 
PDF
はじめてのパターン認識 第8章 サポートベクトルマシン
Motoya Wakiyama
 
PDF
Transformer Introduction (Seminar Material)
Yuta Niki
 
Helm - Package Manager for Kubernetes
Knoldus Inc.
 
Introduction to Helm
Harshal Shah
 
Conversational AI with Transformer Models
Databricks
 
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Tupperware: Containerized Deployment at FB
Docker, Inc.
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Tomoya Oda
 
PRML Chapter 8 (8.0-8.3)
Shogo Nakamura
 
Kubernetes Security Best Practices - With tips for the CKS exam
Ahmed AbouZaid
 
Transformer xl
San Kim
 
Niger National Flag
SH Rajøn
 
[DL輪読会]Multi-Agent Cooperation and the Emergence of (Natural) Language
Deep Learning JP
 
오토인코더의 모든 것
NAVER Engineering
 
Model-Based Reinforcement Learning @NIPS2017
mooopan
 
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Vietnam Open Infrastructure User Group
 
Intro to Helm for Kubernetes
Carlos E. Salazar
 
Recurrent neural network
Syed Annus Ali SHah
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
 
Kubernetes 101 for Beginners
Oktay Esgul
 
はじめてのパターン認識 第8章 サポートベクトルマシン
Motoya Wakiyama
 
Transformer Introduction (Seminar Material)
Yuta Niki
 
Ad

Similar to RL_in_10_min.pptx (20)

PDF
anintroductiontoreinforcementlearning-180912151720.pdf
ssuseradaf5f
 
PPTX
An introduction to reinforcement learning
Subrat Panda, PhD
 
PPTX
lecture_21.pptx - PowerPoint Presentation
butest
 
PPTX
Reinforcement course material samples: lecture 1
YasutoTamura1
 
PDF
Reinfrocement Learning
Natan Katz
 
PPTX
Designing an AI that gains experience for absolute beginners
Tanzim Saqib
 
PPTX
reinforcement-learning-141009013546-conversion-gate02.pptx
MohibKhan79
 
PDF
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Lecture 1 - introduction.pdf
NamanJain758248
 
PPT
Reinforcement Learner) is an intelligent agent that’s always striving to lear...
Diksha363458
 
PPT
about reinforcement-learning ,reinforcement-learning.ppt
ommrudraprasad21
 
PPT
reinforcement-learning its based on the slide of university
MOHDNADEEM971008
 
PPT
reinforcement-learning.ppt
hemalathache
 
PPT
reinforcement-learning.prsentation for c
RahulChouhan572633
 
PDF
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
PDF
Rl chapter 1 introduction
ConnorShorten2
 
PDF
Reinforcement Learning in Practice: Contextual Bandits
Max Pagels
 
PPT
Reinforcement learning
Chandra Meena
 
PPTX
Reinforcement Learning
SVijaylakshmi
 
PDF
RL presentation
Niloofar Sedighian
 
anintroductiontoreinforcementlearning-180912151720.pdf
ssuseradaf5f
 
An introduction to reinforcement learning
Subrat Panda, PhD
 
lecture_21.pptx - PowerPoint Presentation
butest
 
Reinforcement course material samples: lecture 1
YasutoTamura1
 
Reinfrocement Learning
Natan Katz
 
Designing an AI that gains experience for absolute beginners
Tanzim Saqib
 
reinforcement-learning-141009013546-conversion-gate02.pptx
MohibKhan79
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Lecture 1 - introduction.pdf
NamanJain758248
 
Reinforcement Learner) is an intelligent agent that’s always striving to lear...
Diksha363458
 
about reinforcement-learning ,reinforcement-learning.ppt
ommrudraprasad21
 
reinforcement-learning its based on the slide of university
MOHDNADEEM971008
 
reinforcement-learning.ppt
hemalathache
 
reinforcement-learning.prsentation for c
RahulChouhan572633
 
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
Rl chapter 1 introduction
ConnorShorten2
 
Reinforcement Learning in Practice: Contextual Bandits
Max Pagels
 
Reinforcement learning
Chandra Meena
 
Reinforcement Learning
SVijaylakshmi
 
RL presentation
Niloofar Sedighian
 
Ad

More from YasutoTamura1 (6)

PPTX
How to formulate reinforcement learning in illustrative ways
YasutoTamura1
 
PPTX
Brief instruction on backprop
YasutoTamura1
 
PPTX
Illustrative Introductory CNN
YasutoTamura1
 
PPTX
Illustrative Introductory Neural Networks
YasutoTamura1
 
PPTX
Precise LSTM Algorithm
YasutoTamura1
 
PPTX
simple_rnn_forward_back_propagation
YasutoTamura1
 
How to formulate reinforcement learning in illustrative ways
YasutoTamura1
 
Brief instruction on backprop
YasutoTamura1
 
Illustrative Introductory CNN
YasutoTamura1
 
Illustrative Introductory Neural Networks
YasutoTamura1
 
Precise LSTM Algorithm
YasutoTamura1
 
simple_rnn_forward_back_propagation
YasutoTamura1
 

Recently uploaded (20)

PDF
A Systems Thinking Approach to Algorithmic Fairness.pdf
Epistamai
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Azure Data management Engineer project.pptx
sumitmundhe77
 
PPTX
Short term internship project report on power Bi
JMJCollegeComputerde
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
International-health-agency and it's work.pptx
shreehareeshgs
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
A Systems Thinking Approach to Algorithmic Fairness.pdf
Epistamai
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Azure Data management Engineer project.pptx
sumitmundhe77
 
Short term internship project report on power Bi
JMJCollegeComputerde
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Chad Readey - An Independent Thinker
Chad Readey
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
International-health-agency and it's work.pptx
shreehareeshgs
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 

RL_in_10_min.pptx

  • 1. Tech Talk: Reinforcement Learning Tamura Yasuto
  • 2. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 3. Theme of This Tech Talk: Stop Saying “Trial and Errors“ With these charts, you will miss the point in the beginning
  • 4. From “Trial and Errors“ to Interactive Value-Policy Updates Agent Environment Action Reward Value Policy This part should be emphasized more
  • 5. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 6. Role of Reinforcement Learnig (RL) in AI Machine learning AI Machine learning Classical models Neural networks Supervised learning Unsupervised learning Reinforcement learning Models How to train
  • 7. Rough Definition of RL: Planning Problem • Sequential decision making: optimizing a sequence of actions • Optimizing a “policy”: a “policy” means how to move in a given “state” • Assuming Markov decision processes: next action only depends on the current state Policy Action State Example of planning: navigating a robot
  • 8. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 9. Markov Decision Process (MDP) in Some Expressions Agent Env Action Reward • Typical RL diagram • State transition diagram • Backup diagram (closed) • Graphical model
  • 10. MDP: with an Example of Balancing a Bike Or State 0 State 1 State 2 State 3 State 4 Leaning left No move Leaning right
  • 11. Plannign in MDP: Some Expressions • Learning how to move optimally in each state No move Lean left Lean right
  • 12. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping UP
  • 13. Values and Policies: with an Example of Balancing a Bike • Value: how good it is to be in a state • Policies: a probability of taking an action in a state State 0: minus reward State 1: low value State 2: high value State 3: low value State 4: minus reward Action 0: Low probability Action 2: High probability Action 1
  • 14. Policy updates • Higher probability on actions to the direction of high values State 0: minus reward State 1: low value State 2: high value Action 0: leaning left Action 1: leaning right Then how can a vlaue be learned? Giving higher probability
  • 15. Value update: Temporal Difference (TD) Learning • TD learning: updating values by filling a gap between expectation and actual rewards If you lean left, the values is low. As expected! TD loss is low Leaning right would not be good because value is low. I was wrong. There is no bad reward. Let’s update the value. TD loss is high Learning could happen without explicit rewards
  • 16. Interactive Updates of Value and Policy Value updates (TD learning) Policy updates
  • 17. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 18. Wrapping Up • RL formulation: a planning problem by optimizing a policy • Simple assumption of MDP : an action only depends on the current state • Importance of a value: updating a policy by evaluating how good to be in • TD learning: updating values by filling a gap between estimations on values and actual rewards