Gradient clipping and noise injection
Gradient clipping and noise injection are techniques used to improve the training stability and generalization of LLMs.
Gradient clipping, while primarily employed for optimization stability (see Chapter 7), can indirectly contribute to regularization. By limiting the magnitude of gradients, it can constrain the updates to model parameters, potentially leading to a smoother optimization path and preventing overfitting. In some cases, gradient clipping can effectively reduce the impact of certain parameters, especially when gradients for those parameters are consistently clipped. This can lead to a form of implicit sparsity, where less important parameters are effectively downweighted.
Noise injection is a regularization technique commonly used to improve the generalization of machine learning models. By adding a small amount of noise to the input data, weights, or activation functions, noise injection helps prevent overfitting. The technique...