Magnitude-based pruning
Magnitude-based pruning is one of the simplest and most widely used pruning techniques. The idea behind this method is to remove weights in the neural network that contribute least to the model’s overall function, typically, these are weights with the smallest magnitude (absolute value). By pruning such weights, the model becomes more compact and faster, with minimal impact on accuracy:
import torch import torch.nn.utils.prune as prune # Assume model is an instance of a pre-trained LLM model = ...  # Load or define your LLM model # Prune 30% of the lowest magnitude weights in all Linear layers for name, module in model.named_modules():     if isinstance(module, torch.nn.Linear):         prune.l1_unstructured(module, name='weight', amount=0.3) # Remove the pruning reparameterization for name, module in model.named_modules():     if isinstance(module,...