Domain and task generalization
Assessing how well an LLM generalizes across different domains and tasks is crucial for understanding its true capabilities. Let’s explore some techniques for this purpose.
Evaluating domain adaptation
To evaluate domain adaptation, we can test the model on data from a different domain than it was trained on. Here’s an example:
def evaluate_domain_adaptation( model, tokenizer, source_domain_data, target_domain_data ): def predict(text): inputs = tokenizer( text, return_tensors='pt', truncation=True, padding=True ) outputs = model(inputs) return torch.argmax(outputs.logits, dim=1).item() #...