This document summarizes a lecture on reinforcement learning and the Q-learning algorithm. Q-learning is a temporal difference learning method that allows an agent to learn optimal policies without needing a model of the environment's dynamics. The algorithm works by learning an action-value function (Q-function) that directly approximates the optimal Q-function through Q-backups without requiring a model of the environment. Pseudocode is provided for the basic Q-learning algorithm. Examples are also given showing how Q-learning can be used to learn an optimal policy for navigating a maze.