L2 regularization (Ridge regression)
L2 regularization, also known as ridge regression or weight decay, is a technique used to prevent overfitting in machine learning models. It works by adding a penalty term to the loss function, which is proportional to the square of the model’s weights. This penalty term discourages the model from assigning large weights to individual features, leading to a simpler and more generalized model. By minimizing the combined loss function, which includes both the original loss and the penalty term, the model finds a balance between fitting the training data well and keeping the weights small, ultimately improving its ability to generalize to new, unseen data
Here’s how to use it:
from torch.optim import AdamW def train_with_weight_decay( model, train_dataloader, weight_decay=0.01, lr=5e-5, epochs=3 ): optimizer = AdamW(model.parameters(), lr=lr, ...