Fairness-aware learning: From single models to sequential ensemble learning and learning over data streams

RecSys Meeting@Tampere University (online), 18.2.2021
Fairness-aware learning:
From single models to sequential ensemble
learning and learning over data streams
Eirini Ntoutsi
Free University Berlin
(Leibniz University Hannover & L3S Research Center)

Outline
 Introduction
 Batch (single-model) fairness-aware learning
 Fairness-aware sequential ensemble learning (boosting)
 Fairness-aware learning in data streams
 Wrapping up
2
Eirini Ntoutsi Fairness-aware learning: From single models to sequential ensemble learning and learning over data streams

Successful applications
3
Recommendations Navigation
Severe weather alerts
Automation

Questionable uses/ failures
4
Google flu trends failure Microsoft’s bot Tay taken offline
after racist tweets
IBM’s Watson for Oncology
cancelled
Facial recognition works better
for white males

Why AI-projects might fail?
 Back to basics: How machines learn
 Machine Learning gives computers the ability to learn without being
explicitly programmed (Arthur Samuel, 1959)
 We don’t codify the solution, we don’t even know it!
 DATA & the learning algorithms are the keys
5
Algorithms
Models
Models
Data

Watch out for (hidden) assumptions
 Assumptions include stationarity, independent & identically distributed
data, balanced class representation ...
 In this talk, I will focus on the assumption/myth of algorithmic objectivity
1. The common misconception that humans are subjective, but data and
algorithms not and therefore they cannot discriminate.
6

Reality check: Can algorithms discriminate?
 Bloomberg analysts compared Amazon same-day delivery areas with U.S.
Census Bureau data
 They found that in 6 major same-day delivery cities, the service area
excludes predominantly black ZIP codes to varying degrees.
 Shouldn’t this service be based on customer’s spend rather than race?
 Amazon claimed that race was not used in their models.
7
Source: https://www.bloomberg.com/graphics/2016-amazon-same-day/

Reality check cont’: Can algorithms discriminate?
 There have been already plenty of cases of algorithmic discrimination
 State of the art visions systems (used e.g. in autonomous driving) recognize
better white males than black women (racial and gender bias)
 Google’s AdFisher tool for serving personalized ads was found to serve
significantly fewer ads for high paid jobs to women than men (gender-bias)
 COMPAS tool (US) for predicting a defendant’s risk of committing another
crime predicted higher risks of recidivism for black defendants (and lower for
white defendants) than their actual risk (racial-bias)
8

Dont blame (only) the AI
 “Bias is as old as human civilization” and “it is human nature for members
of the dominant majority to be oblivious to the experiences of other
groups”
 Human bias: a prejudice in favour of or against one thing, person, or group
compared with another usually in a way that’s considered to be unfair.
 Bias triggers (protected attributes): ethnicity, race, age, gender, religion, sexual
orientation …
 Algorithmic bias: the inclination or prejudice of a decision made by an AI
system which is for or against one person or group, especially in a way
considered to be unfair.
9

Bias, an overloaded term
 Inductive bias ”refers to a set of (explicit or implicit) assumptions made by
a learning algorithm in order to perform induction, that is, to generalize a
finite set of observation (training data) into a general model of the
domain. Without a bias of that kind, induction would not be possible,
since the observations can normally be generalized in many ways.”
(Hüllermeier, Fober & Mernberger, 2013)
 Bias-free learning is futile: A learner that makes no a priori assumptions
regarding the identity of the target concept has no rational basis for
classifying any unseen instances.
 Some biases are positive and helpful, e.g., making healthy eating choices
 We refer here to bias that might cause discrimination and unfair actions
to an individual or group on the basis of protected attributes like race or
gender
10

The fairness-aware machine learning domain
 A young, fast evolving, multi-disciplinary research field
 Bias/fairness/discrimination/… have been studied for long in philosophy, social
sciences, law, …
 Existing approaches can be divided into three categories
 Understanding bias
 How bias is created in the society and enters our sociotechnical systems, is
manifested in the data used by AI algorithms, and can be formalized.
 Mitigating bias
 Approaches that tackle bias in different stages of AI-decision making.
 Accounting for bias
 Approaches that account for bias proactively or retroactively.
11
E. Ntoutsi, P. Fafalios, U. Gadiraju, V. Iosifidis, W. Nejdl, M.-E. Vidal, S. Ruggieri, F. Turini, S. Papadopoulos, E. Krasanakis, I. Kompatsiaris, K. Kinder-
Kurlanda, C. Wagner, F. Karimi, M. Fernandez, H. Alani, B. Berendt, T. Kruegel, C. Heinze, K. Broelemann, G. Kasneci, T. Tiropanis, S. Staab"Bias in
data-driven artificial intelligence systems—An introductory survey", WIREs Data Mining and Knowledge Discovery, 2020.

Outline
 Introduction
 Wrapping up
12

Fairness-aware batch/static learning setup
 Input: D = training dataset drawn from a joint distribution P(F,S,y)
 F: set of non-protected attributes
 S: (typically: binary, single) protected attribute
 s (s ̄): protected (non-protected) group
 y = (typically: binary) class attribute {+,-} (+ for accepted, - for rejected)
 Goal of fairness-aware classification: Learn a mapping from f(F, S) → y
 achieves good predictive performance
 eliminates discrimination
13
F1 F2 S y
User1 f11 f12 s +
User2 f21 -
User3 f31 f23 s +
… … … … …
Usern fn1 +
We know how to measure this
According to some fairness measure

Measuring (un)fairness
 Types of fairness measures: group fairness, individual fairness
 Group fairness: protected (s) and non-protected (s ̄)groups should be treated
similarly
 Representative measures: statistical parity, equal opportunity, equalized
odds
 Main critic: when focusing on the group less qualified members may be
chosen
 Individual fairness: similar individuals should be treated similarly
 Representative measures: counterfactual fairness
 Main critic: it is hard to evaluate proximity of instances (M. Kim et al, NIPS
2018)
14

Measuring (un)fairness
 Statistical parity: If subjects in both protected and unprotected groups
should have equal probability of being assigned to the positive class
(Dwork et al, 2012)
𝑃 ො
𝑦 = + 𝑆 = 𝑠 = 𝑃 ො
𝑦 = + 𝑆 = ҧ
𝑠
 Equalized Odds: There should be no difference in model’s prediction errors
between protected and non-protected groups for both classes (Hardt et
al., NIPS 16):
15

Mitigating bias
 Goal: tackling bias in different stages of AI-decision making
16
Algorithms
Models
Models
Data
Applications
Hiring
Banking
Healthcare
Education
Autonomous
driving
…
Pre-processing
approaches
In-processing
approaches
Post-processing
approaches

Mitigating bias: pre-processing approaches
 Intuition: making the data more fair will result in a less unfair model
 Idea: balance the protected and non-protected groups in the dataset
 Design principle: minimal data interventions (to retain data utility for the
learning task)
 Different techniques:
 Instance class modification (massaging), (Kamiran & Calders, 2009),(Luong,
Ruggieri, & Turini, 2011)
 Instance selection (sampling), (Kamiran & Calders, 2010) (Kamiran & Calders,
2012)
 Instance weighting, (Calders, Kamiran, & Pechenizkiy, 2009)
 Synthetic instance generation (Iosifidis & Ntoutsi, 2018)
 …
17

Mitigating bias: pre-processing approaches: Massaging
 Change the class label of carefully selected instances (Kamiran & Calders, 2009).
 The selection is based on a ranker which ranks the individuals by their probability to
receive the favorable outcome.
 The number of massaged instances depends on the fairness measure (group fairness)
18
Image credit Vasileios Iosifidis

Mitigating bias: pre-processing approaches: discussion
 Most of the techniques are heuristics and the impact of the interventions
is not well controlled
 Approaches also exist that change the data towards fairness while
controlling the per-instance distortion and by preserving data utility,
(Calmon et al, 2017).
19

Mitigating bias
20
Algorithms
Models
Models
Data
Applications
Hiring
Banking
Healthcare
Education
Autonomous
driving
…
Pre-processing
approaches
In-processing
approaches
Post-processing
approaches

Mitigating bias: in-processing approaches
 Intuition: working directly with the algorithm allows for better control
 Idea: explicitly incorporate the model’s discrimination behavior in the
objective function
 Design principle: “balancing” predictive- and fairness-performance
 Regularization (Kamiran, Calders & Pechenizkiy, 2010),(Kamishima, Akaho,
Asoh & Sakuma, 2012), (Dwork, Hardt, Pitassi, Reingold & Zemel, 2012) (Zhang
& Ntoutsi, 2019)
 Constraints (Zafar, Valera, Gomez-Rodriguez & Gummadi, 2017)
 training on latent target labels (Krasanakis, Xioufis, Papadopoulos &
Kompatsiaris, 2018)
 In-training altering of data distribution (Iosifidis & Ntoutsi, 2019)
 …
21

Mitigating bias
22
Algorithms
Models
Models
Data
Applications
Hiring
Banking
Healthcare
Education
Autonomous
driving
…
Pre-processing
approaches
In-processing
approaches
Post-processing
approaches

Mitigating bias: post-processing approaches
 Intuition: start with predictive performance
 Idea: first optimize the model for predictive performance and then tune
for fairness
 Design principle: minimal interventions (to retain model predictive
performance)
 Correct the confidence scores (Pedreschi, Ruggieri, & Turini, 2009), (Calders &
Verwer, 2010)
 Correct the class labels (Kamiran et al., 2010)
 Change the decision boundary (Kamiran, Mansha, Karim, & Zhang, 2018), (Hardt,
Price, & Srebro, 2016)
 Wrap a fair classifier on top of a black-box learner (Agarwal, Beygelzimer, Dudík,
Langford, & Wallach, 2018)
 …
23

Mitigating bias: pοst-processing approaches: shift the
decision boundary
 An example of decision boundary shift
24
V. Iosifidis, H.T. Thi Ngoc, E. Ntoutsi, “Fairness-enhancing interventions in stream classification", DEXA 2019.

Outline
 Introduction
 Wrapping up
25

Fairness with sequential learners (boosting)
 Sequential ensemble methods generate base learners in a sequence
 The sequential generation of base learners promotes the dependence
between the base learners.
 Each learner learns from the mistakes of the previous predictor
 The weak learners are combined to build a strong learner
 Popular examples: Adaptive Boosting (AdaBoost), Extreme Gradient
Boosting (XGBoost).
26

AdaBoost
 AdaBoost (Freund and Schapire, 1995), a sequential ensemble method
that in each round, re-weights the training data to focus on misclassified
instances.
 The final strong learner is a weighted combination of the weak learners
27
Source: https://www.sciencedirect.com/topics/engineering/adaboost

Intuition behind using boosting for fairness
 It is easier to make “fairness-related interventions” in simpler models
rather than complex ones
 We can use the whole sequence of learners for the interventions instead
of the current one
29

Still the batch/static fairness-aware learning setup
 Input: D = training dataset drawn from a joint distribution P(F,S,y)
 F: set of non-protected attributes
 S: (typically: binary, single) protected attribute
 s (s ̄): protected (non-protected) group
 y = (typically: binary) class attribute {+,-} (+ for accepted, - for rejected)
 Goal of fairness-aware classification: Learn a mapping from f(F, S) → y
 achieves good predictive performance
 eliminates discrimination
30
F1 F2 S y
User1 f11 f12 s +
User2 f21 -
User3 f31 f23 s +
… … … … …
Usern fn1 +
We know how to measure this
According to some fairness measure

Fairness measure: Equalized Odds
 Our fairness measure is Equalized Odds which measures the difference in
model’s prediction errors between protected and non-protected groups
for both classes:
 Smaller values are better (ideally Eq.Odds = 0)
31

Limitations of related work
 Existing works evaluate predictive performance in terms of the overall
classification error rate (ER), e.g., [Calders et al’09, Calmon et al’17, Fish et
al’16, Hardt et al’16, Krasanakis et al’18, Zafar et al’17]
 In case of class-imbalance, ER is misleading
 Most of the datasets however suffer from imbalance
 Moreover, Eq.Odds “is oblivious” to the class imbalance problem
32

From Adaboost to AdaFair
 We tailor AdaBoost to fairness
 We introduce the notion of cumulative fairness that assesses the fairness of
the model up to the current boosting round (partial ensemble).
 We directly incorporate fairness in the instance weighting process
(traditionally focusing on classification performance).
 We optimize the number of weak learners in the final ensemble based on
balanced error rate thus directly considering class imbalance in the best model
selection.
33
𝐸𝑅 = 1 −
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃
V. Iosifidis, E. Ntoutsi, “AdaFair: Cumulative Fairness Adaptive Boosting", ACM CIKM 2019.

AdaFair: Cumulative boosting fairness
 Let j: 1−T be the current boosting round, T is user defined
 Let be the partial ensemble, up to current round j.
 The cumulative fairness of the ensemble up to round j, is defined based
on the parity in the predictions of the partial ensemble between
protected and non-protected groups
 ``Forcing’’ the model to consider ``historical’’ fairness over all previous
rounds instead of just focusing on current round hj() results in better
classifier performance and model convergence.
35

AdaFair: fairness-aware weighting of instances
 Vanilla AdaBoost already boosts misclassified instances for the next round
 Our weighting explicitly targets fairness by extra boosting discriminated
groups for the next round
 The data distribution at boosting round j+1 is updated as follows
 The fairness-related cost ui of instances xi ϵ D which belong to a group
that is discriminated is defined as follows:
36

AdaFair pseudocode
37

AdaFair: optimizing the number of weak learners
 Typically, the number of boosting rounds/ weak learners T is user-defined
 We propose to select the optimal subsequence of learners 1 … θ, θ ≤ T
that minimizes the balanced error rate (BER)
 In particular, we consider both ER and BER in the objective function
 The result of this optimization if a final ensemble model with Eq.Odds
fairness
38

Experimental evaluation
 Datasets of varying imbalance
 Baselines
 AdaBoost [Sch99]: vanilla AdaBoost
 SMOTEBoost [CLHB03]: AdaBoost with SMOTE for imbalanced data.
 Krasanakis et al. [KXPK18]: Boosting method which minimizes Equalised Odds by
approximating the underlying distribution of hidden correct labels.
 Zafar et al.[ZVGRG17]: Training logistic regression model with convex-concave
constraints to minimize Equalised Odds
 AdaFair NoCumul: Variation of AdaFair that computes the fairness weights based on
individual weak learners.
39

Experiments: Predictive and fairness performance
 Adult census income (ratio 1+:3-)  Bank dataset (ratio 1+:8-)
40
Larger values are better, for Eq.Odds lower values are better
 Our method achieves high balanced accuracy and low discrimination (Eq.Odds) while
maintaining high TPRs and TNRs for both groups.
 The methods of Zafar et al and Krasanakis et al, eliminate discrimination by rejecting more
positive instances (lowering TPRs).

Cumulative vs non-cumulative fairness
 Cumulative vs non-cumulative fairness impact on model performance
 Cumulative notion of fairness performs better
 The cumulative model (AdaFair) is more stable than its non-cumulative
counterpart (standard deviation is higher)
41

Outline
 Introduction
 Wrapping up
42

xn
Time
…
x1 …
x3
x2
Fairness-aware stream learning setup
 Input: Stream X of instances x1, x2, …, xt, … arriving at timepoints t1, t2, … ,tn, …
 Fixed d-dimensional feature space, xi∈ 𝑅𝑑
 A (typically: binary, single) protected attribute S = {s, s ̄} s:protected
 Prequential evaluation setup: For a new instance xt at t, predict its class label ෝ
𝑦𝑡
using the previously learned model ht-1.
 Later, the true label of xt , i.e., yt, is revealed and the loss L(ෝ
𝑦𝑡,yt) is determined.
 y = (typically: binary) target class {+,-} (+ for accepted, - for rejected)
 The old model ht-1 is updated into ht: ht=train(ht-1,dt)
 Goal of stream classification: ht should maintain a good predictive performance
 Goal of fairness-aware stream classification: ht should also maintain fairness
performance  online fairness
43

Why we need to update the model?
 The stationarity assumption does not hold anymore.
 As data evolve with time, the classifier is becoming invalid/obsolete
 An example of a population at 2 consecutive timepoints t, t’
 The old classifier is not valid anymore
 Concept drift: the joint distribution P(X,y) might change over the stream:
Ǝ X: Pt(X,y) ≠ Pt’(X,y)
44
1

How the fairness of the model is affected?
 Changes in the decision boundary of the model (due to concept drifts)
affect the fairness of the model.
 An initially fair classifier might become unfair later
 So the update of the model should also consider fairness
45
V. Iosifidis, H.T. Thi Ngoc, E. Ntoutsi, “Fairness-enhancing interventions in stream classification", DEXA 2019.

Fairness-Aware Hoeffding Tree (FAHT)
 An in-processing approach to fairness
 FAHT extends the Hoeffding tree (HT) classifier for fairness by
directly considering fairness in the splitting criterion
 HT uses the Hoeffding bound to decide on when and how to split
 Let G() be the heuristic split attribute selection measure
 After seeing n instances at a node, let the difference between the 2
best attributes be
 ΔG is the random variable being estimated by the Hoeffding bound
 if ΔG>ε, the Hoeffding bound guarantees that we can confidently choose
attribute a for splitting
 Such decisions are based on information gain to optimize predictive
performance and do not consider fairness.
46
W. Zhang, E. Ntoutsi, “An Adaptive Fairness-aware Decision Tree Classifier", IJCAI 2019.
n
R
2
)
/
1
ln(
2

 

Fairness-aware Hoeffding Tree (FAHT)
 We introduce the fairness gain of an attribute (FG)
 Disc(D) corresponds to statistical parity (group fairness)
 We introduce the joint criterion, fair information gain (FIG) that evaluates
the suitability of a candidate splitting attribute A in terms of both
predictive performance and fairness.
47
D
D1
D2

Experiments: Predictive and fairness performance
 FAHT is capable of diminishing the discrimination to a lower level while
maintaining a fairly comparable accuracy.
 FAHT results in a shorter tree comparing to HT, as its splitting criterion FIG is more
restrictive comparing to IG.
48
Adult dataset
Qualitative results:
• HT selects “capital-gain” as the root
attribute, FAHT selects “Age”
• Capital gain is directly related with the
annual salary (class) but probably also
mirrors intrinsic discrimination of the
data

Outline
 Introduction
 Dealing with bias in data-driven AI systems
 Understanding bias
 Mitigating bias
 Accounting for bias
 Wrapping up
52

Wrapping-up, ongoing work and future directions
 In this talk I focused on the myth of algorithmic objectivity and
 the reality of algorithmic bias and discrimination and how algorithms can pick biases
existing in the input data and further reinforce them
 A large body of research already exists but
 focuses mainly on fully-supervised batched learning with single-protected (and typically
binary) attributes with binary classes
 targets bias in some step of the analysis-pipeline, but biases/errors might be propagated
and even amplified (unified approached are needed)
53
V. Iosifidis, E. Ntoutsi, “FABBOO - Online Fairness-aware Learning under Class Imbalance", DS 2020.
T. Hu, V. Iosifidis, W. Liao, H. Zang, M. Yang, E. Ntoutsi,B. Rosenhahn, "FairNN - Conjoint Learning of Fair Representations for Fair
Decisions”, DS 2020.

Wrapping-up, ongoing work and future directions
 Moving from single-protected attribute fairness-aware learning to multi-
fairness
 Existing legal studies define multi-fairness as compound, intersectional and
overlapping [Makkonen 2002].
 Moving from fully-supervised learning to unsupervised and reinforcement
learning
 Moving from myopic (maximize short-term/immediate performance) solutions
to non-myopic ones (that consider long-term performance) [Zhang et al,2020]
 Actionable approaches (counterfactual generation)
54

Thank you for you attention!
Questions?
55
https://nobias-project.eu/
@NoBIAS_ITN
https://www.bias-project.org/
Feel free to contact me:
ntoutsi@l3s.de
@entoutsi

Fairness-aware learning: From single models to sequential ensemble learning and learning over data streams

More Related Content

What's hot(13)

Similar to Fairness-aware learning: From single models to sequential ensemble learning and learning over data streams(20)

More from Eirini Ntoutsi(10)

Recently uploaded(20)

Fairness-aware learning: From single models to sequential ensemble learning and learning over data streams