Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000
Partially Observable MDP POMDPs are characterized by: States: s  S Actions: a  A Observations: o  O Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Observation probabilities: T(o,a,s’)=Pr(o|s,a) Rewards: R(s,a) Beliefs: b(s t )=Pr(s t |o t ,a t ,…,o 0 ,a 0 ) S 1 S 2 S 3
The problem How can we find good policies for complex POMDPs? Is there a principled way to provide near-optimal policies?
Proposed Approach Exploit  structure  in the problem domain. What type of structure? Action set partitioning  Act InvestigateHealth Move Navigate CheckPulse AskWhere Left Right Up Down CheckMeds
Hierarchical POMDP Planning What do we start with? A full POMDP model: {S o ,A o ,O o ,M o }. An action set partitioning graph. Key idea: Break the problem into many “related” POMDPs. Each smaller POMDP has only a subset of A o . imposing policy constraint But why? POMDP: exponential run-time per value iteration O(|A|  n-1 |O| )
Example M B K E 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.8 POMDP: S o = { M eds,  K itchen,  B edroom} A o  = {ClarifyTask, Check M eds, GoTo K itchen, GoTo B edroom} O o  = {Noise,  M eds,  K itchen,  B edroom} Value Function: MedsState KitchenState BedroomState 0.8 GoToKitchen ClarifyTask GoToBedroom CheckMeds
Hierarchical POMDP Action Partitioning: Act Move CheckMeds ClarifyTask ClarifyTask GoToKitchen GoToBedroom
Local Value Function and Policy -  Move  Controller ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState
Modeling Abstract Actions ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState Problem :  Need parameters for abstract action  Move Solution :  Use the local policy of corresponding low-level controller General form :  Pr ( s j  | s i , a k abstract  ) = Pr ( s j  | s i , Policy(a k abstract ,s i ) ) Example : Pr ( s j  |  MedsState ,  Move  ) = Pr ( s j  |  MedsState , ClarifyTask ) Policy   (Move,s i ):
Local Value Function and Policy -  Act  Controller Move MedsState KitchenState BedroomState CheckMeds
Comparing Policies Hierarchical Policy: Optimal Policy: = ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom
Bounding the value of the approximation Value function of top-level controller is an  upper-bound  on the value of the approximation. Why ?  We were optimistic when modeling the abstract action. Similarly, we can find a  lower-bound . How ?  We can assume “worst-case” view when modeling the abstract action. If we partition the action set differently, we will get different bounds.
A real dialogue management example - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative - AskHealth - OfferHelp - SayTime Act CheckHealth Phone DoMeds CheckWeather Move Greet
Results:
Final words We presented: a general framework to exploit structure in POMDPs; Future work: automatic generation of good action partitioning; conditions for additional observation abstraction; bigger problems!

More Related Content

PDF
Proximal Policy Optimization (Reinforcement Learning)
KEY
Regret-Based Reward Elicitation for Markov Decision Processes
PDF
Planning in Markov Stochastic Task Domains
PPT
Cs221 lecture8-fall11
PPT
RL intro
PDF
Solving Hidden-Semi-Markov-Mode Markov Decision problems
PDF
Deep reinforcement learning from scratch
PPTX
technical seminar2.pptx.on markov decision process
Proximal Policy Optimization (Reinforcement Learning)
Regret-Based Reward Elicitation for Markov Decision Processes
Planning in Markov Stochastic Task Domains
Cs221 lecture8-fall11
RL intro
Solving Hidden-Semi-Markov-Mode Markov Decision problems
Deep reinforcement learning from scratch
technical seminar2.pptx.on markov decision process

Similar to Hierarchical Pomdp Planning And Execution (20)

PDF
Markovian sequential decision-making in non-stationary environments: applicat...
PPTX
Making Complex Decisions(Artificial Intelligence)
PDF
Derya_Sezen_POMDP_thesis
PPTX
unit-4 Markov Decision process presentation.pptx
PPT
Hierarchical Reinforcement Learning
PDF
Cs229 notes12
PDF
Policy-Gradient for deep reinforcement learning.pdf
PDF
Lecture 1 - introduction.pdf
PPT
POMDP Seminar Backup3
PPTX
lecture_21.pptx - PowerPoint Presentation
PDF
Reinfrocement Learning
PDF
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
PDF
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
PDF
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
PDF
Markov decision process
PDF
Hierarchical Reinforcement Learning with Option-Critic Architecture
PPTX
Deep Learning in Robotics
PPTX
Reinforcement Learning
PPT
reinforcement-learning.ppt
PPT
reinforcement-learning its based on the slide of university
Markovian sequential decision-making in non-stationary environments: applicat...
Making Complex Decisions(Artificial Intelligence)
Derya_Sezen_POMDP_thesis
unit-4 Markov Decision process presentation.pptx
Hierarchical Reinforcement Learning
Cs229 notes12
Policy-Gradient for deep reinforcement learning.pdf
Lecture 1 - introduction.pdf
POMDP Seminar Backup3
lecture_21.pptx - PowerPoint Presentation
Reinfrocement Learning
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
Markov decision process
Hierarchical Reinforcement Learning with Option-Critic Architecture
Deep Learning in Robotics
Reinforcement Learning
reinforcement-learning.ppt
reinforcement-learning its based on the slide of university
Ad

More from ahmad bassiouny (20)

PPTX
Work Study & Productivity
PPT
Work Study
PPT
Motion And Time Study
PPT
Motion Study
PPT
The Christmas Story
PPS
Turkey Photos
PPT
Mission Bo Kv3
PPT
PPT
Linearization
PPT
Kblmt B000 Intro Kaizen Based Lean Manufacturing
PPT
How To Survive
PPT
Ancient Hieroglyphics
PPS
Dubai In 2009
PPT
DesignPeopleSystem
PPT
Organizational Behavior
PPT
Work Study Workshop
PPT
Workstudy
PPT
Time And Motion Study
Work Study & Productivity
Work Study
Motion And Time Study
Motion Study
The Christmas Story
Turkey Photos
Mission Bo Kv3
Linearization
Kblmt B000 Intro Kaizen Based Lean Manufacturing
How To Survive
Ancient Hieroglyphics
Dubai In 2009
DesignPeopleSystem
Organizational Behavior
Work Study Workshop
Workstudy
Time And Motion Study
Ad

Recently uploaded (20)

PPTX
Computer Architecture Input Output Memory.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
International_Financial_Reporting_Standa.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
My India Quiz Book_20210205121199924.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
Computer Architecture Input Output Memory.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
AI-driven educational solutions for real-life interventions in the Philippine...
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Cambridge-Practice-Tests-for-IELTS-12.docx
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
International_Financial_Reporting_Standa.pdf
Introduction to pro and eukaryotes and differences.pptx
My India Quiz Book_20210205121199924.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Hazard Identification & Risk Assessment .pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Paper A Mock Exam 9_ Attempt review.pdf.

Hierarchical Pomdp Planning And Execution

  • 1. Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000
  • 2. Partially Observable MDP POMDPs are characterized by: States: s  S Actions: a  A Observations: o  O Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Observation probabilities: T(o,a,s’)=Pr(o|s,a) Rewards: R(s,a) Beliefs: b(s t )=Pr(s t |o t ,a t ,…,o 0 ,a 0 ) S 1 S 2 S 3
  • 3. The problem How can we find good policies for complex POMDPs? Is there a principled way to provide near-optimal policies?
  • 4. Proposed Approach Exploit structure in the problem domain. What type of structure? Action set partitioning Act InvestigateHealth Move Navigate CheckPulse AskWhere Left Right Up Down CheckMeds
  • 5. Hierarchical POMDP Planning What do we start with? A full POMDP model: {S o ,A o ,O o ,M o }. An action set partitioning graph. Key idea: Break the problem into many “related” POMDPs. Each smaller POMDP has only a subset of A o . imposing policy constraint But why? POMDP: exponential run-time per value iteration O(|A|  n-1 |O| )
  • 6. Example M B K E 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.8 POMDP: S o = { M eds, K itchen, B edroom} A o = {ClarifyTask, Check M eds, GoTo K itchen, GoTo B edroom} O o = {Noise, M eds, K itchen, B edroom} Value Function: MedsState KitchenState BedroomState 0.8 GoToKitchen ClarifyTask GoToBedroom CheckMeds
  • 7. Hierarchical POMDP Action Partitioning: Act Move CheckMeds ClarifyTask ClarifyTask GoToKitchen GoToBedroom
  • 8. Local Value Function and Policy - Move Controller ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState
  • 9. Modeling Abstract Actions ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState Problem : Need parameters for abstract action Move Solution : Use the local policy of corresponding low-level controller General form : Pr ( s j | s i , a k abstract ) = Pr ( s j | s i , Policy(a k abstract ,s i ) ) Example : Pr ( s j | MedsState , Move ) = Pr ( s j | MedsState , ClarifyTask ) Policy (Move,s i ):
  • 10. Local Value Function and Policy - Act Controller Move MedsState KitchenState BedroomState CheckMeds
  • 11. Comparing Policies Hierarchical Policy: Optimal Policy: = ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom
  • 12. Bounding the value of the approximation Value function of top-level controller is an upper-bound on the value of the approximation. Why ? We were optimistic when modeling the abstract action. Similarly, we can find a lower-bound . How ? We can assume “worst-case” view when modeling the abstract action. If we partition the action set differently, we will get different bounds.
  • 13. A real dialogue management example - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative - AskHealth - OfferHelp - SayTime Act CheckHealth Phone DoMeds CheckWeather Move Greet
  • 15. Final words We presented: a general framework to exploit structure in POMDPs; Future work: automatic generation of good action partitioning; conditions for additional observation abstraction; bigger problems!

Editor's Notes

  • #2: Talk to you about my recent work on ...