SlideShare a Scribd company logo
Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000
Partially Observable MDP POMDPs are characterized by: States: s  S Actions: a  A Observations: o  O Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Observation probabilities: T(o,a,s’)=Pr(o|s,a) Rewards: R(s,a) Beliefs: b(s t )=Pr(s t |o t ,a t ,…,o 0 ,a 0 ) S 1 S 2 S 3
The problem How can we find good policies for complex POMDPs? Is there a principled way to provide near-optimal policies?
Proposed Approach Exploit  structure  in the problem domain. What type of structure? Action set partitioning  Act InvestigateHealth Move Navigate CheckPulse AskWhere Left Right Up Down CheckMeds
Hierarchical POMDP Planning What do we start with? A full POMDP model: {S o ,A o ,O o ,M o }. An action set partitioning graph. Key idea: Break the problem into many “related” POMDPs. Each smaller POMDP has only a subset of A o . imposing policy constraint But why? POMDP: exponential run-time per value iteration O(|A|  n-1 |O| )
Example M B K E 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.8 POMDP: S o = { M eds,  K itchen,  B edroom} A o  = {ClarifyTask, Check M eds, GoTo K itchen, GoTo B edroom} O o  = {Noise,  M eds,  K itchen,  B edroom} Value Function: MedsState KitchenState BedroomState 0.8 GoToKitchen ClarifyTask GoToBedroom CheckMeds
Hierarchical POMDP Action Partitioning: Act Move CheckMeds ClarifyTask ClarifyTask GoToKitchen GoToBedroom
Local Value Function and Policy -  Move  Controller ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState
Modeling Abstract Actions ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState Problem :  Need parameters for abstract action  Move Solution :  Use the local policy of corresponding low-level controller General form :  Pr ( s j  | s i , a k abstract  ) = Pr ( s j  | s i , Policy(a k abstract ,s i ) ) Example : Pr ( s j  |  MedsState ,  Move  ) = Pr ( s j  |  MedsState , ClarifyTask ) Policy   (Move,s i ):
Local Value Function and Policy -  Act  Controller Move MedsState KitchenState BedroomState CheckMeds
Comparing Policies Hierarchical Policy: Optimal Policy: = ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom
Bounding the value of the approximation Value function of top-level controller is an  upper-bound  on the value of the approximation. Why ?  We were optimistic when modeling the abstract action. Similarly, we can find a  lower-bound . How ?  We can assume “worst-case” view when modeling the abstract action. If we partition the action set differently, we will get different bounds.
A real dialogue management example - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative - AskHealth - OfferHelp - SayTime Act CheckHealth Phone DoMeds CheckWeather Move Greet
Results:
Final words We presented: a general framework to exploit structure in POMDPs; Future work: automatic generation of good action partitioning; conditions for additional observation abstraction; bigger problems!

More Related Content

PPT
Approaching Production Ecology
ahmad bassiouny
 
PPT
Htn Planning In A Tool Supported
ahmad bassiouny
 
PPT
Graph Planning
ahmad bassiouny
 
PPT
Census Planning And Budgeting
ahmad bassiouny
 
PPT
Cognitive Models
ahmad bassiouny
 
PPT
Health & Safety Management For Quarries
ahmad bassiouny
 
PPT
Ai Planning For Semantic Web Service Composition
ahmad bassiouny
 
PPT
Work Study
ahmad bassiouny
 
Approaching Production Ecology
ahmad bassiouny
 
Htn Planning In A Tool Supported
ahmad bassiouny
 
Graph Planning
ahmad bassiouny
 
Census Planning And Budgeting
ahmad bassiouny
 
Cognitive Models
ahmad bassiouny
 
Health & Safety Management For Quarries
ahmad bassiouny
 
Ai Planning For Semantic Web Service Composition
ahmad bassiouny
 
Work Study
ahmad bassiouny
 

Similar to Hierarchical Pomdp Planning And Execution (20)

PDF
Planning in Markov Stochastic Task Domains
Waqas Tariq
 
PPT
Cs221 lecture8-fall11
darwinrlo
 
PPT
RL intro
KhangBom
 
PDF
Solving Hidden-Semi-Markov-Mode Markov Decision problems
Emmanuel Hadoux
 
PDF
Deep reinforcement learning from scratch
Jie-Han Chen
 
PPTX
technical seminar2.pptx.on markov decision process
mudavathnarasimhanai
 
PDF
Proximal Policy Optimization (Reinforcement Learning)
Thom Lane
 
PDF
Markovian sequential decision-making in non-stationary environments: applicat...
Emmanuel Hadoux
 
PPTX
Making Complex Decisions(Artificial Intelligence)
United International University
 
PDF
Derya_Sezen_POMDP_thesis
Derya SEZEN
 
PPTX
unit-4 Markov Decision process presentation.pptx
PrasadHsv1
 
PPT
Hierarchical Reinforcement Learning
ahmad bassiouny
 
PDF
Cs229 notes12
VuTran231
 
PDF
Policy-Gradient for deep reinforcement learning.pdf
21522733
 
PDF
Lecture 1 - introduction.pdf
NamanJain758248
 
PPT
POMDP Seminar Backup3
Darin Hitchings, Ph.D.
 
PPTX
lecture_21.pptx - PowerPoint Presentation
butest
 
PDF
Reinfrocement Learning
Natan Katz
 
PDF
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
YuvrajBirdi
 
PDF
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
mlaij
 
Planning in Markov Stochastic Task Domains
Waqas Tariq
 
Cs221 lecture8-fall11
darwinrlo
 
RL intro
KhangBom
 
Solving Hidden-Semi-Markov-Mode Markov Decision problems
Emmanuel Hadoux
 
Deep reinforcement learning from scratch
Jie-Han Chen
 
technical seminar2.pptx.on markov decision process
mudavathnarasimhanai
 
Proximal Policy Optimization (Reinforcement Learning)
Thom Lane
 
Markovian sequential decision-making in non-stationary environments: applicat...
Emmanuel Hadoux
 
Making Complex Decisions(Artificial Intelligence)
United International University
 
Derya_Sezen_POMDP_thesis
Derya SEZEN
 
unit-4 Markov Decision process presentation.pptx
PrasadHsv1
 
Hierarchical Reinforcement Learning
ahmad bassiouny
 
Cs229 notes12
VuTran231
 
Policy-Gradient for deep reinforcement learning.pdf
21522733
 
Lecture 1 - introduction.pdf
NamanJain758248
 
POMDP Seminar Backup3
Darin Hitchings, Ph.D.
 
lecture_21.pptx - PowerPoint Presentation
butest
 
Reinfrocement Learning
Natan Katz
 
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
YuvrajBirdi
 
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
mlaij
 
Ad

More from ahmad bassiouny (20)

PPTX
Work Study & Productivity
ahmad bassiouny
 
PPT
Motion And Time Study
ahmad bassiouny
 
PPT
Motion Study
ahmad bassiouny
 
PPT
The Christmas Story
ahmad bassiouny
 
PPS
Turkey Photos
ahmad bassiouny
 
PPT
Mission Bo Kv3
ahmad bassiouny
 
PPT
Miramar
ahmad bassiouny
 
PPT
Linearization
ahmad bassiouny
 
PPT
Kblmt B000 Intro Kaizen Based Lean Manufacturing
ahmad bassiouny
 
PPT
How To Survive
ahmad bassiouny
 
PPT
Ancient Hieroglyphics
ahmad bassiouny
 
PPS
Dubai In 2009
ahmad bassiouny
 
PPT
DesignPeopleSystem
ahmad bassiouny
 
PPT
Organizational Behavior
ahmad bassiouny
 
PPT
Work Study Workshop
ahmad bassiouny
 
PPT
Workstudy
ahmad bassiouny
 
PPT
Time And Motion Study
ahmad bassiouny
 
PPT
_olympic
ahmad bassiouny
 
Work Study & Productivity
ahmad bassiouny
 
Motion And Time Study
ahmad bassiouny
 
Motion Study
ahmad bassiouny
 
The Christmas Story
ahmad bassiouny
 
Turkey Photos
ahmad bassiouny
 
Mission Bo Kv3
ahmad bassiouny
 
Linearization
ahmad bassiouny
 
Kblmt B000 Intro Kaizen Based Lean Manufacturing
ahmad bassiouny
 
How To Survive
ahmad bassiouny
 
Ancient Hieroglyphics
ahmad bassiouny
 
Dubai In 2009
ahmad bassiouny
 
DesignPeopleSystem
ahmad bassiouny
 
Organizational Behavior
ahmad bassiouny
 
Work Study Workshop
ahmad bassiouny
 
Workstudy
ahmad bassiouny
 
Time And Motion Study
ahmad bassiouny
 
_olympic
ahmad bassiouny
 
Ad

Recently uploaded (20)

PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PPTX
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
CDH. pptx
AneetaSharma15
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PDF
RA 12028_ARAL_Orientation_Day-2-Sessions_v2.pdf
Seven De Los Reyes
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
CDH. pptx
AneetaSharma15
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
RA 12028_ARAL_Orientation_Day-2-Sessions_v2.pdf
Seven De Los Reyes
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 

Hierarchical Pomdp Planning And Execution

  • 1. Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000
  • 2. Partially Observable MDP POMDPs are characterized by: States: s  S Actions: a  A Observations: o  O Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Observation probabilities: T(o,a,s’)=Pr(o|s,a) Rewards: R(s,a) Beliefs: b(s t )=Pr(s t |o t ,a t ,…,o 0 ,a 0 ) S 1 S 2 S 3
  • 3. The problem How can we find good policies for complex POMDPs? Is there a principled way to provide near-optimal policies?
  • 4. Proposed Approach Exploit structure in the problem domain. What type of structure? Action set partitioning Act InvestigateHealth Move Navigate CheckPulse AskWhere Left Right Up Down CheckMeds
  • 5. Hierarchical POMDP Planning What do we start with? A full POMDP model: {S o ,A o ,O o ,M o }. An action set partitioning graph. Key idea: Break the problem into many “related” POMDPs. Each smaller POMDP has only a subset of A o . imposing policy constraint But why? POMDP: exponential run-time per value iteration O(|A|  n-1 |O| )
  • 6. Example M B K E 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.8 POMDP: S o = { M eds, K itchen, B edroom} A o = {ClarifyTask, Check M eds, GoTo K itchen, GoTo B edroom} O o = {Noise, M eds, K itchen, B edroom} Value Function: MedsState KitchenState BedroomState 0.8 GoToKitchen ClarifyTask GoToBedroom CheckMeds
  • 7. Hierarchical POMDP Action Partitioning: Act Move CheckMeds ClarifyTask ClarifyTask GoToKitchen GoToBedroom
  • 8. Local Value Function and Policy - Move Controller ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState
  • 9. Modeling Abstract Actions ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState Problem : Need parameters for abstract action Move Solution : Use the local policy of corresponding low-level controller General form : Pr ( s j | s i , a k abstract ) = Pr ( s j | s i , Policy(a k abstract ,s i ) ) Example : Pr ( s j | MedsState , Move ) = Pr ( s j | MedsState , ClarifyTask ) Policy (Move,s i ):
  • 10. Local Value Function and Policy - Act Controller Move MedsState KitchenState BedroomState CheckMeds
  • 11. Comparing Policies Hierarchical Policy: Optimal Policy: = ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom
  • 12. Bounding the value of the approximation Value function of top-level controller is an upper-bound on the value of the approximation. Why ? We were optimistic when modeling the abstract action. Similarly, we can find a lower-bound . How ? We can assume “worst-case” view when modeling the abstract action. If we partition the action set differently, we will get different bounds.
  • 13. A real dialogue management example - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative - AskHealth - OfferHelp - SayTime Act CheckHealth Phone DoMeds CheckWeather Move Greet
  • 15. Final words We presented: a general framework to exploit structure in POMDPs; Future work: automatic generation of good action partitioning; conditions for additional observation abstraction; bigger problems!

Editor's Notes

  • #2: Talk to you about my recent work on ...