Detecting bias
Detecting bias in LLMs often involves analyzing model outputs across different demographic groups or for different types of inputs. Here are some techniques:
- Word embeddings: This code measures gender bias in word embeddings by comparing the projection of profession words onto the gender direction:
from gensim.models import KeyedVectors import numpy as np def word_embedding_bias(     model, male_words, female_words, profession_words ):     male_vectors = [model[word] for word in male_words if word in model.key_to_index]     female_vectors = [model[word] for word in female_words         if word in model.key_to_index]     male_center = np.mean(male_vectors, axis=0)     female_center = np.mean(female_vectors, axis=0)     gender_direction = male_center - female_center     biases = []  ...