Detecting bias
Detecting bias in LLMs often involves analyzing model outputs across different demographic groups or for different types of inputs. Here are some techniques:
- Word embeddings: This code measures gender bias in word embeddings by comparing the projection of profession words onto the gender direction:
from gensim.models import KeyedVectors import numpy as np def word_embedding_bias( model, male_words, female_words, profession_words ): male_vectors = [model[word] for word in male_words if word in model.key_to_index] female_vectors = [model[word] for word in female_words if word in model.key_to_index] male_center = np.mean(male_vectors, axis=0) female_center = np.mean(female_vectors, axis=0) gender_direction = male_center - female_center biases = [] ...