Batch normalization is a technique that normalizes the inputs to each node in a neural network layer. It reduces internal covariate shift and allows higher learning rates to be used, speeding up training. The method normalizes each feature by subtracting the batch mean and dividing by the batch standard deviation. It was shown to significantly accelerate training of state-of-the-art models on MNIST and ImageNet, requiring only a fraction of the training steps to reach the same level of accuracy compared to models without batch normalization.