Strategies for freezing and unfreezing layers
The idea behind selectively freezing and unfreezing layers is rooted in how knowledge is structured and distributed across a deep neural network. Lower layers in LLMs tend to capture more general language representations—such as syntax, part-of-speech, and morphology—while higher layers are more specialized and task-dependent. This hierarchical organization allows us to leverage the general-purpose linguistic knowledge already encoded in the early layers while fine-tuning only the task-specific portions of the network.
By freezing the lower layers, we preserve their pre-trained capabilities and prevent catastrophic forgetting, which can occur if the entire model is updated indiscriminately on a narrow domain dataset. This also drastically reduces the number of trainable parameters, leading to lower memory usage and faster convergence. Meanwhile, selectively unfreezing the upper layers allows the model to adapt its representations...