Probing methods
Probing involves training simple models on the internal representations of an LLM to assess what linguistic properties are captured at different layers.
Different layers in a transformer specialize in different linguistic properties. Lower layers capture syntax and token identity; middle layers handle grammar and sentence structure; and higher layers focus on semantics, reasoning, and factual recall. This hierarchy emerges naturally during training, with lower layers excelling in syntactic tasks and higher layers in semantic reasoning. Probing studies confirm this specialization, aiding interpretability, fine-tuning, and model compression for task-specific optimizations.
Here’s an example of how to implement a probing task:
import torch from transformers import BertTokenizer, BertModel from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score def probe_bert_layers...