Evaluating the impact of data augmentation
To assess the effectiveness of our data augmentation techniques, we need to assess their impact on LLM performance.
Perplexity
You can measure a model’s perplexity (see Chapter 2) on a held-out test set before and after data augmentation to assess whether it has improved the model’s ability to predict unseen text:
def evaluate_perplexity(model, tokenizer, test_data): model.eval() total_loss = 0 total_tokens = 0 with torch.no_grad(): for text in test_data: inputs = tokenizer( text, return_tensors="pt" ).to(model.device) ...