Dr. Subrat Panda gave an overview of reinforcement learning. He defined reinforcement learning as dealing with agents that must sense and act upon their environment to receive delayed scalar feedback in the form of rewards. The goal is to learn an optimal policy that maps states to actions to maximize total future discounted reward. Q-learning is introduced as a way to estimate state-action values directly without needing a model of the environment. Q-learning updates estimates based on new observations and prior estimates in a process called bootstrapping. Exploration must be balanced with exploitation of current knowledge. Real-world applications of deep reinforcement learning were discussed.