SlideShare a Scribd company logo
1
XGBoost @ Fyber
From Theory to Production
ss
About Us
Fyber at a Glance
SAN
FRANCISCO
NEW YORK
LONDON
BERLIN
TEL AVIV
BEIJING
Publicly Traded
FBEN Frankfurt
350+
Employees
50%
Of employees
are R&D & Product
7 Offices
Berlin l Tel Aviv l San Francisco
New York l London l Beijing | Korea
KOREA
How big our
Big Data is?
10B Auctions
Per Day
150M DAU
250B Bid
Requests
Per Day
10K+ Apps 300TB
Generated Monthly
300 Users
Level
Dimensions
80 Reported
Dimensions
(on real-time reporting)
65 Reported
Metrics
5
The Goal: Maximize the value of our data
■ Technologies used (Spark, Druid, Presto, and more...)
■ Analysis on new & existing data products and algorithms
■ Implementing product releases and A/B testing on main products (i.e. “groups” by users)
■ Creating dashboards for existing and new products (both business & tech oriented)
■ Product researches & POC’s (e.g. MLeap)
■ Algorithms Development
A taste of Data Science @ Fyber
6
Our main Use-Cases for this session
Two main use-cases which XGBoost was implemented for:
Audience Vault Reach
(Fyber Marketplace)
CTR Prediction
(full model based on
Criteo’s AI Lab use
case, Offerwall)
7
Why XGBoost?
■ State of the art results on data competitions
■ Works great with tabular data
■ Works great with Spark and big data
■ Combines several ML optimization methodologies (boosting, bagging)
■ Good time doing feature engineering 😎
ss
Decision Trees
9
Decision Trees - Definition
■ Decision tree builds classification or
regression models in the form of a
tree structure
■ Decision tree breaks down a dataset
into smaller and smaller subsets
■ Decision tree can be viewed as
“divide and conquer” algorithm
10
Decision Trees - Steps
■ Take the entire data set as input
■ Search for a split that maximizes the "separation"
of the classes using: infogain / gini index
■ Apply the split
■ Again, search for beneficial split
■ Stop when you meet some stopping criteria
■ Check on test data according to tree
11
Decision Trees - Information Gain
■ We want to determine which attribute in a given set of training feature vectors is most useful for
discriminating between the classes to be learned
■ Information gain tells us how important a given attribute of the feature vector is
Information gain = Entropy (parent) – Average Entropy (children)
https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
12
Decision Trees - Entropy (Example#1)
“lack of order or predictability” From google
A common way to measure impurity in a group of samples.
Values may differ from 0 to 1 - 0 : pure , 1 : not pure
Entropy = ∑-pi*log2(pi)
Example:
■ 16/30 are green circles
■ 14/30 are pink crosses
■ “Greens” log2 calculation: log2(16/30) = -0.9
■ “Pink” log2 calculation: log2(14/30) = -1.1
■ Entropy = -(16/30)*(-0.9) –(14/30)*(-1.1) = 0.99
https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
13
Decision Trees - Entropy (Example #2)
A common way to measure impurity in a group of samples.
Entropy = ∑-pi*log2(pi)
Example:
■ 0/30 are green circles
■ 30/30 are pink crosses
■ “Greens” log2 calculation: log2(0/30) = Undefined
■ “Pink” log2 calculation: log2(30/30) = log2(1) = 0
■ Entropy = –(30/30)*(0) = 0
https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
14
Decision Trees - Information Gain
Information gain =
Entropy (parent) – Average Entropy (children)
https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
15
Decision Trees - Special Parameters
■ maxDepth: Maximum depth of the tree
■ minInfoGain: Minimum gain for a split to be
considered at a tree node
■ minInstancesPerNode: Minimum number of
instances each child must have after split
■ algo: type of decision tree (Classification /
Regression)
16
Decision Tree - main problems
The tree is too complicated -
Variance / Overfitting
The tree is too basic -
Bias / Underfitting
17
The tree is too
basic -
Underfitting / Bias
The tree is too
complicated -
Overfitting /
Variance
Introducing XGBoost
18
XGBoost
■ Random Forest
■ Bagging
■ Gradient boosting
19
Solution 1: Random Forest
Aimed to reduce variance
20
Solution 2: Bagging
Aimed to reduce variance
21
Solution 3: Gradient Boosting
Aimed to reduce bias
■ Build a very basic model using mean
■ Calculate error for every data point
■ Use the calculated error as label
■ Try to predict the new label using a tree (learning rate is a must)
■ Use learning rate in order to avoid overshooting
■ Repeat the process
22
Solution 3:
Gradient
Boosting
Aimed to reduce bias
Height Food Gender Weight Kg
192 Pizza Male 88
166 Pasta Female 76
182 Pasta Male 80
175 Pizza Male 73
160 Pizza Female 77
165 Pizza Female 57
23
Solution 3 : Gradient Boosting
Aimed to reduce bias
Height Food Gender Weight Kg Error
192 Pizza Male 88 88-71.2 =16.8
166 Pasta Female 76 76-71.2 = 4.8
182 Pasta Male 80 80-71.2 = 8.8
175 Pizza Male 73 73-71.2 = 1.8
160 Pizza Female 77 77-71.2 = 5.8
165 Pizza Female 57 57-71.2 =-14.2
Average : 71.2 Kg
This is our basic model
24
Solution 3 : Gradient Boosting
Aimed to reduce bias
The error is our new label, so our
model will try to predict the error
25
Solution 3 : Gradient Boosting
Aimed to reduce bias
Starting point
Height < 175Height => 175
15 2
26
Height Weight Old model
Old Error
(new Label)
New model New Error
192 88 71.2 16.8 71.2+15*0.1= 72.7 15.3
166 76 71.2 4.8 71.2+2*0.1 = 71.4 4.6
182 80 71.2 8.8 71.2+15*0.1 = 72.7 7.3
170 73 71.2 1.8 71.2+2*0.1 = 71.4 1.6
160 77 71.2 5.8 71.2+2*0.1 = 71.4 5.6
165 57 71.2 -14.2 71.2+2*0.1 = 71.4 −14.4
Starting
point
Height <
175
Height >=
175
15 2
Based on Learning Rate = 0.1
Solution 3 : Gradient Boosting
Aimed to reduce bias
27
XGBoost - Examples of Hyper Parameters
■ maxDepth: Maximum depth of the tree
■ numRound: number of classifiers that are been
built
■ objective: What kind of prediction we want to
preform? (classification, regression, ranking)
■ eta: Learning Rate
■ ColsampleByTree: What is the ratio of columns
that will be used by every tree
ss Fyber’s XGBoost
Use Cases
29
Our main Use-Cases
Two main use-cases which XGBoost was implemented for:
Audience Vault Reach
(Fyber Marketplace)
CTR Prediction
(full model based on
Criteo’s AI Lab use
case, Offerwall)
30
A word about XGBoost with Spark
■ XGBoost4J Latest stable release - May 2019
■ Allows huge data processing
■ The project is constantly being updated, stabilized and many features are being added
■ Supports Java, Scala (Spark)
■ Soon: XGBoost with PySpark
■ Spark ML framework (MLlib) functionality integrates smoothly with XGBoost. It contains:
○ String Indexer
○ One Hot Encoding
○ Vector Assembler
○ Tokenizer
○ And many more...
31
Audience
Vault
32
Old Situation:
■ Audience vault presents past data, audience reach is not accurate at all
New Solution:
■ A model that will easily integrate to the Audience Vault and will present estimated audience
reach for the next 14 days
■ Audience reach needs to be easily presented regardless which filter will be chosen (countless possibilities)
XGBoost with Spark enabled us perform predictions on hundreds of millions of users
Audience Vault Reach
33
Account Managers
Need to sell audiences to
customers, so they MUST know
the amount of the relevant audience
size
External Clients
Would like to target audiences
through the Fyber Marketplace, and
therefore MUST know the audience
size
Target Audience
34
How was it done?
Data Preparation
Feature
Engineering
Vector Assembler
Model Preparation Training Transformation
Pipeline Preparation
35
Data Pre-processing
Feature Engineering
Active Per ${X} days
features
Requests Per ${X} days
features
Load Relevant Data
Historical Data
(30 days)
Label (14 days)
How was it done?
Data
Preparation
Feature
Engineering
Vector
Assembler
Model
Preparation
Training Transformation
36
How was it done?
Data Pre-processing
Device Id sumRequests30Days sumRequests14Days sumRequests7Days sumActive30Days sumActive14Days sumActive7Days
1 1 0 0 1 0 0
2 102 89 89 7 3 3
3 1 1 0 1 1 0
4 23 7 2 12 4 1
5 26 8 0 6 2 0
6 214 117 15 8 6 3
Data
Preparation
Feature
Engineering
Vector
Assembler
Model
Preparation
Training Transformation
37
Model preparation
Vector Assembler
VectorAssembler is a Spark transformer that combines a given list of columns into a single vector column
val assembler = new VectorAssembler()
.setInputCols(Array("hour", "mobile",
"userFeatures"))
.setOutputCol("features")
How was it done?
Data
Preparation
Feature
Engineering
Vector
Assembler
Model
Preparation
Training Transformation
38
XGBoostRegressor
An instance of the XGBoost object which is used for Regression and Classification tasks
Model preparation
How was it done?
Data
Preparation
Feature
Engineering
Vector
Assembler
Model
Preparation
Training Transformation
39
XGBoostRegressor
Going through few of the parameters that were used as part of the “tweaking” (aka “hyperparameter tuning”):
■ eta - learning rate of the model (usually: 0 < eta < 1) - step size shrinkage for gradient boosting
■ max_depth - Maximum Depth per tree (as part of the whole model). Deeper trees prone to overfitting
■ subsample - ratio (0 to 1) of training instance in a tree, meant to prevent overfitting (encourage variance between trees)
■ colsample_bytree - ratio (0 to 1) of sub sample ratio of columns (features) when building each of the trees
■ objective - the “learning task” of the model. In our case, logistic regression probability (0 <= P(X) <= 1) was used, with a
label of 0 / 1 (0 - the user wasn’t active during the “next” 14 days, 1 - the user was active during the “next” 14 days)
Model preparation
How was it done?
Data
Preparation
Feature
Engineering
Vector
Assembler
Model
Preparation
Training Transformation
40
ML Pipelines
The goal: Combine multiple steps / algorithms into a single pipeline / workflow
■ A Pipeline chains multiple actions together to specify an ML workflow
■ Pipeline Stages are specified as an ordered way
■ Persistence - we can save and load entire pipelines for future usages
Putting it all together
How was it done?
Data
Preparation
Feature
Engineering
Vector
Assembler
Model
Preparation
Training Transformation
41
CTR Prediction
(Offer Wall)
42
Old Situation:
● Offer scores and ranks were set by old manual parameters and configurations, therefore does not reflect the real
performance
New Solution:
● A ML model that automatically ranks the relevant offers based on their attributes, and therefore able to estimate the
relevant “target” a lot better (i.e. “best in geo XYZ”, “best in application 123”, etc.)
CTR Prediction
(Offer Wall)
43
Pipeline Preparation
How was it done?
Feature
Engineering
Vector
Assembler
Training Transformation
Data
Preparation
String
Indexing
One Hot
Encoding
Model
Preparation
44
Data Pre-processing
● Raw data - the raw events are the most important as those can help you to be familiar with the
data, the trends, the outliers, and contains tons of “magic”
Timestamp Advertisement_Id Application_Id User_country_code Event
2019-09-03 2:24:59 1296881 40647 US Impression
2019-09-03 2:25:03 1303994 40647 US Impression
2019-09-03 2:25:38 1288117 110226 US Impression
2019-09-03 2:25:39 1303946 119548 SA Impression
2019-09-03 2:25:58 1252617 106241 KG Impression
How was it done?
Feature
Engineering
Vector
Assembler
Training Transformation
Data
Preparation
String
Indexing
One Hot
Encoding
Model
Preparation
45
● Insights from the raw data -
○ As every “live-traffic” product, there are tons of “long tail” applications and advertisements that
doesn’t generate significant data - should focus (“OOV”, for example)
○ Timestamp is our “master” for time-series data, as we can analyze and see what were the KPIs on
some specific time - should focus
○ As we’re dealing with CTR (i.e. clicks / imps), we don’t focus on outliers (i.e. when clicks > imps) -
should investigate and choose next steps
○ Special cases where dimensions that are not relevant to that product are being ingested to raw data
- should investigate and choose next steps
Data Pre-processing
How was it done?
Feature
Engineering
Vector
Assembler
Training Transformation
Data
Preparation
String
Indexing
One Hot
Encoding
Model
Preparation
46
● Feature Engineering
○ Window Functions - those are used to calculate KPIs (and specifically - impressions, clicks) for
several times during the training dataset
○ GroupBy queries - in order to have “different” insights of label (CTR) - CTR by n dimensions is
somehow related to CTR by n - 1, n - 2, …, dimensions
○ Normalization - this technique was used as one of our main features didn’t have the same scale of
values. Some values had 100-1000, others had 0-1, and we had to scale them down. The goal was to
have a value between 0 to . Normalization was done using MinMax normalization, though there are
other techniques as well (with STD, mean,...)
Feature Engineering
How was it done?
Feature
Engineering
Vector
Assembler
Training Transformation
Data
Preparation
String
Indexing
One Hot
Encoding
Model
Preparation
47
Model preparation
How was it done?
XGBoostRegressor
Feature
Engineering
Vector
Assembler
Training Transformation
Data
Preparation
String
Indexing
One Hot
Encoding
Model
Preparation
48
XGBoostRegressor
● “Missing” - this is a representation for different strategies to handle missing values as part of your
dataset
More info can be found here, and official PDF here
● Ignore missing values
● “setHandleInvalid = skip” in VectorAssembler
● Handle missing values
● Change data accordingly (fill.na = $someNumber)
● “setHandleInvalid = keep” in VectorAssembler
● Add “missing = $someNumber” in XGBoostRegressor
Model preparation
How was it done?
Feature
Engineering
Vector
Assembler
Training Transformation
Data
Preparation
String
Indexing
One Hot
Encoding
Model
Preparation
49
Model
Scores
50
XGBoost Feature Importance (via Information Gain)
Understand how good / bad you’re features are, and act accordingly
ss
Code Session
■ GitHub Repo

More Related Content

PDF
XGBoost: the algorithm that wins every competition
Jaroslaw Szymczak
 
PPTX
Introduction of Xgboost
michiaki ito
 
PPTX
Introduction to XGboost
Shuai Zhang
 
PDF
Demystifying Xgboost
halifaxchester
 
PPTX
Xgboost: A Scalable Tree Boosting System - Explained
Simon Lia-Jonassen
 
PPTX
Ensemble Learning.pptx
piyushkumar222909
 
PDF
Xgboost
Vivian S. Zhang
 
PDF
Introduction to XGBoost
Joonyoung Yi
 
XGBoost: the algorithm that wins every competition
Jaroslaw Szymczak
 
Introduction of Xgboost
michiaki ito
 
Introduction to XGboost
Shuai Zhang
 
Demystifying Xgboost
halifaxchester
 
Xgboost: A Scalable Tree Boosting System - Explained
Simon Lia-Jonassen
 
Ensemble Learning.pptx
piyushkumar222909
 
Introduction to XGBoost
Joonyoung Yi
 

What's hot (20)

PDF
Understanding Bagging and Boosting
Mohit Rajput
 
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
PPTX
svm.pptx
PriyadharshiniG41
 
PPTX
Decision trees and random forests
Debdoot Sheet
 
PPTX
Ensemble learning Techniques
Babu Priyavrat
 
PPTX
XGBOOST [Autosaved]12.pptx
yadav834181
 
PPTX
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
PPT
Support Vector machine
Anandha L Ranganathan
 
PDF
Ridge regression
Ananda Swarup
 
PPTX
boosting algorithm
Prithvi Paneru
 
PDF
Understanding random forests
Marc Garcia
 
PPTX
Ensemble Method (Bagging Boosting)
Abdullah al Mamun
 
PDF
머신 러닝을 해보자 1장 (2022년 스터디)
ssusercdf17c
 
PPTX
SVM
Bangalore
 
PDF
CART: Not only Classification and Regression Trees
Marc Garcia
 
PDF
XGBoost & LightGBM
Gabriel Cypriano Saca
 
PPTX
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
PPTX
Decision Tree Classification Algorithm.pptx
PriyadharshiniG41
 
PPTX
CART – Classification & Regression Trees
Hemant Chetwani
 
PPTX
XgBoost.pptx
sumankumar507
 
Understanding Bagging and Boosting
Mohit Rajput
 
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
Decision trees and random forests
Debdoot Sheet
 
Ensemble learning Techniques
Babu Priyavrat
 
XGBOOST [Autosaved]12.pptx
yadav834181
 
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
Support Vector machine
Anandha L Ranganathan
 
Ridge regression
Ananda Swarup
 
boosting algorithm
Prithvi Paneru
 
Understanding random forests
Marc Garcia
 
Ensemble Method (Bagging Boosting)
Abdullah al Mamun
 
머신 러닝을 해보자 1장 (2022년 스터디)
ssusercdf17c
 
SVM
Bangalore
 
CART: Not only Classification and Regression Trees
Marc Garcia
 
XGBoost & LightGBM
Gabriel Cypriano Saca
 
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
Decision Tree Classification Algorithm.pptx
PriyadharshiniG41
 
CART – Classification & Regression Trees
Hemant Chetwani
 
XgBoost.pptx
sumankumar507
 
Ad

Similar to XGBoost @ Fyber (20)

PPTX
Jay Yagnik at AI Frontiers : A History Lesson on AI
AI Frontiers
 
PDF
Use Machine Learning to Get the Most out of Your Big Data Clusters
Databricks
 
PDF
Tuning for Systematic Trading: Talk 2: Deep Learning
SigOpt
 
PPTX
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
PDF
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
PPTX
LF Energy Webinar - Unveiling OpenEEMeter 4.0
DanBrown980551
 
PDF
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
PDF
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
PDF
Maximize Impact: Learn from the Dual Pillars of Open-Source Energy Planning T...
IEA-ETSAP
 
PPTX
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 
PDF
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
PDF
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
Berker Kozan
 
PDF
Advanced Optimization for the Enterprise Webinar
SigOpt
 
PDF
Hands-on - Machine Learning using scikitLearn
avrtraining021
 
PDF
Start machine learning in 5 simple steps
Renjith M P
 
PDF
Building successful and secure products with AI and ML
Simon Lia-Jonassen
 
PDF
Advanced WhizzML Workflows
BigML, Inc
 
PDF
Prespective analytics with DOcplex and pandas
PyDataParis
 
PPTX
S2 NIGHT SKILL.pptx
YashaswiniChandrappa1
 
PPTX
S2 NIGHT SKILL.pptx
yashaswinic11
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
AI Frontiers
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Databricks
 
Tuning for Systematic Trading: Talk 2: Deep Learning
SigOpt
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
DanBrown980551
 
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
Maximize Impact: Learn from the Dual Pillars of Open-Source Energy Planning T...
IEA-ETSAP
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
Berker Kozan
 
Advanced Optimization for the Enterprise Webinar
SigOpt
 
Hands-on - Machine Learning using scikitLearn
avrtraining021
 
Start machine learning in 5 simple steps
Renjith M P
 
Building successful and secure products with AI and ML
Simon Lia-Jonassen
 
Advanced WhizzML Workflows
BigML, Inc
 
Prespective analytics with DOcplex and pandas
PyDataParis
 
S2 NIGHT SKILL.pptx
YashaswiniChandrappa1
 
S2 NIGHT SKILL.pptx
yashaswinic11
 
Ad

Recently uploaded (20)

PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Software Development Methodologies in 2025
KodekX
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
The Future of Artificial Intelligence (AI)
Mukul
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 

XGBoost @ Fyber

  • 1. 1 XGBoost @ Fyber From Theory to Production
  • 3. Fyber at a Glance SAN FRANCISCO NEW YORK LONDON BERLIN TEL AVIV BEIJING Publicly Traded FBEN Frankfurt 350+ Employees 50% Of employees are R&D & Product 7 Offices Berlin l Tel Aviv l San Francisco New York l London l Beijing | Korea KOREA
  • 4. How big our Big Data is? 10B Auctions Per Day 150M DAU 250B Bid Requests Per Day 10K+ Apps 300TB Generated Monthly 300 Users Level Dimensions 80 Reported Dimensions (on real-time reporting) 65 Reported Metrics
  • 5. 5 The Goal: Maximize the value of our data ■ Technologies used (Spark, Druid, Presto, and more...) ■ Analysis on new & existing data products and algorithms ■ Implementing product releases and A/B testing on main products (i.e. “groups” by users) ■ Creating dashboards for existing and new products (both business & tech oriented) ■ Product researches & POC’s (e.g. MLeap) ■ Algorithms Development A taste of Data Science @ Fyber
  • 6. 6 Our main Use-Cases for this session Two main use-cases which XGBoost was implemented for: Audience Vault Reach (Fyber Marketplace) CTR Prediction (full model based on Criteo’s AI Lab use case, Offerwall)
  • 7. 7 Why XGBoost? ■ State of the art results on data competitions ■ Works great with tabular data ■ Works great with Spark and big data ■ Combines several ML optimization methodologies (boosting, bagging) ■ Good time doing feature engineering 😎
  • 9. 9 Decision Trees - Definition ■ Decision tree builds classification or regression models in the form of a tree structure ■ Decision tree breaks down a dataset into smaller and smaller subsets ■ Decision tree can be viewed as “divide and conquer” algorithm
  • 10. 10 Decision Trees - Steps ■ Take the entire data set as input ■ Search for a split that maximizes the "separation" of the classes using: infogain / gini index ■ Apply the split ■ Again, search for beneficial split ■ Stop when you meet some stopping criteria ■ Check on test data according to tree
  • 11. 11 Decision Trees - Information Gain ■ We want to determine which attribute in a given set of training feature vectors is most useful for discriminating between the classes to be learned ■ Information gain tells us how important a given attribute of the feature vector is Information gain = Entropy (parent) – Average Entropy (children) https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
  • 12. 12 Decision Trees - Entropy (Example#1) “lack of order or predictability” From google A common way to measure impurity in a group of samples. Values may differ from 0 to 1 - 0 : pure , 1 : not pure Entropy = ∑-pi*log2(pi) Example: ■ 16/30 are green circles ■ 14/30 are pink crosses ■ “Greens” log2 calculation: log2(16/30) = -0.9 ■ “Pink” log2 calculation: log2(14/30) = -1.1 ■ Entropy = -(16/30)*(-0.9) –(14/30)*(-1.1) = 0.99 https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
  • 13. 13 Decision Trees - Entropy (Example #2) A common way to measure impurity in a group of samples. Entropy = ∑-pi*log2(pi) Example: ■ 0/30 are green circles ■ 30/30 are pink crosses ■ “Greens” log2 calculation: log2(0/30) = Undefined ■ “Pink” log2 calculation: log2(30/30) = log2(1) = 0 ■ Entropy = –(30/30)*(0) = 0 https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
  • 14. 14 Decision Trees - Information Gain Information gain = Entropy (parent) – Average Entropy (children) https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf
  • 15. 15 Decision Trees - Special Parameters ■ maxDepth: Maximum depth of the tree ■ minInfoGain: Minimum gain for a split to be considered at a tree node ■ minInstancesPerNode: Minimum number of instances each child must have after split ■ algo: type of decision tree (Classification / Regression)
  • 16. 16 Decision Tree - main problems The tree is too complicated - Variance / Overfitting The tree is too basic - Bias / Underfitting
  • 17. 17 The tree is too basic - Underfitting / Bias The tree is too complicated - Overfitting / Variance Introducing XGBoost
  • 18. 18 XGBoost ■ Random Forest ■ Bagging ■ Gradient boosting
  • 19. 19 Solution 1: Random Forest Aimed to reduce variance
  • 20. 20 Solution 2: Bagging Aimed to reduce variance
  • 21. 21 Solution 3: Gradient Boosting Aimed to reduce bias ■ Build a very basic model using mean ■ Calculate error for every data point ■ Use the calculated error as label ■ Try to predict the new label using a tree (learning rate is a must) ■ Use learning rate in order to avoid overshooting ■ Repeat the process
  • 22. 22 Solution 3: Gradient Boosting Aimed to reduce bias Height Food Gender Weight Kg 192 Pizza Male 88 166 Pasta Female 76 182 Pasta Male 80 175 Pizza Male 73 160 Pizza Female 77 165 Pizza Female 57
  • 23. 23 Solution 3 : Gradient Boosting Aimed to reduce bias Height Food Gender Weight Kg Error 192 Pizza Male 88 88-71.2 =16.8 166 Pasta Female 76 76-71.2 = 4.8 182 Pasta Male 80 80-71.2 = 8.8 175 Pizza Male 73 73-71.2 = 1.8 160 Pizza Female 77 77-71.2 = 5.8 165 Pizza Female 57 57-71.2 =-14.2 Average : 71.2 Kg This is our basic model
  • 24. 24 Solution 3 : Gradient Boosting Aimed to reduce bias The error is our new label, so our model will try to predict the error
  • 25. 25 Solution 3 : Gradient Boosting Aimed to reduce bias Starting point Height < 175Height => 175 15 2
  • 26. 26 Height Weight Old model Old Error (new Label) New model New Error 192 88 71.2 16.8 71.2+15*0.1= 72.7 15.3 166 76 71.2 4.8 71.2+2*0.1 = 71.4 4.6 182 80 71.2 8.8 71.2+15*0.1 = 72.7 7.3 170 73 71.2 1.8 71.2+2*0.1 = 71.4 1.6 160 77 71.2 5.8 71.2+2*0.1 = 71.4 5.6 165 57 71.2 -14.2 71.2+2*0.1 = 71.4 −14.4 Starting point Height < 175 Height >= 175 15 2 Based on Learning Rate = 0.1 Solution 3 : Gradient Boosting Aimed to reduce bias
  • 27. 27 XGBoost - Examples of Hyper Parameters ■ maxDepth: Maximum depth of the tree ■ numRound: number of classifiers that are been built ■ objective: What kind of prediction we want to preform? (classification, regression, ranking) ■ eta: Learning Rate ■ ColsampleByTree: What is the ratio of columns that will be used by every tree
  • 29. 29 Our main Use-Cases Two main use-cases which XGBoost was implemented for: Audience Vault Reach (Fyber Marketplace) CTR Prediction (full model based on Criteo’s AI Lab use case, Offerwall)
  • 30. 30 A word about XGBoost with Spark ■ XGBoost4J Latest stable release - May 2019 ■ Allows huge data processing ■ The project is constantly being updated, stabilized and many features are being added ■ Supports Java, Scala (Spark) ■ Soon: XGBoost with PySpark ■ Spark ML framework (MLlib) functionality integrates smoothly with XGBoost. It contains: ○ String Indexer ○ One Hot Encoding ○ Vector Assembler ○ Tokenizer ○ And many more...
  • 32. 32 Old Situation: ■ Audience vault presents past data, audience reach is not accurate at all New Solution: ■ A model that will easily integrate to the Audience Vault and will present estimated audience reach for the next 14 days ■ Audience reach needs to be easily presented regardless which filter will be chosen (countless possibilities) XGBoost with Spark enabled us perform predictions on hundreds of millions of users Audience Vault Reach
  • 33. 33 Account Managers Need to sell audiences to customers, so they MUST know the amount of the relevant audience size External Clients Would like to target audiences through the Fyber Marketplace, and therefore MUST know the audience size Target Audience
  • 34. 34 How was it done? Data Preparation Feature Engineering Vector Assembler Model Preparation Training Transformation Pipeline Preparation
  • 35. 35 Data Pre-processing Feature Engineering Active Per ${X} days features Requests Per ${X} days features Load Relevant Data Historical Data (30 days) Label (14 days) How was it done? Data Preparation Feature Engineering Vector Assembler Model Preparation Training Transformation
  • 36. 36 How was it done? Data Pre-processing Device Id sumRequests30Days sumRequests14Days sumRequests7Days sumActive30Days sumActive14Days sumActive7Days 1 1 0 0 1 0 0 2 102 89 89 7 3 3 3 1 1 0 1 1 0 4 23 7 2 12 4 1 5 26 8 0 6 2 0 6 214 117 15 8 6 3 Data Preparation Feature Engineering Vector Assembler Model Preparation Training Transformation
  • 37. 37 Model preparation Vector Assembler VectorAssembler is a Spark transformer that combines a given list of columns into a single vector column val assembler = new VectorAssembler() .setInputCols(Array("hour", "mobile", "userFeatures")) .setOutputCol("features") How was it done? Data Preparation Feature Engineering Vector Assembler Model Preparation Training Transformation
  • 38. 38 XGBoostRegressor An instance of the XGBoost object which is used for Regression and Classification tasks Model preparation How was it done? Data Preparation Feature Engineering Vector Assembler Model Preparation Training Transformation
  • 39. 39 XGBoostRegressor Going through few of the parameters that were used as part of the “tweaking” (aka “hyperparameter tuning”): ■ eta - learning rate of the model (usually: 0 < eta < 1) - step size shrinkage for gradient boosting ■ max_depth - Maximum Depth per tree (as part of the whole model). Deeper trees prone to overfitting ■ subsample - ratio (0 to 1) of training instance in a tree, meant to prevent overfitting (encourage variance between trees) ■ colsample_bytree - ratio (0 to 1) of sub sample ratio of columns (features) when building each of the trees ■ objective - the “learning task” of the model. In our case, logistic regression probability (0 <= P(X) <= 1) was used, with a label of 0 / 1 (0 - the user wasn’t active during the “next” 14 days, 1 - the user was active during the “next” 14 days) Model preparation How was it done? Data Preparation Feature Engineering Vector Assembler Model Preparation Training Transformation
  • 40. 40 ML Pipelines The goal: Combine multiple steps / algorithms into a single pipeline / workflow ■ A Pipeline chains multiple actions together to specify an ML workflow ■ Pipeline Stages are specified as an ordered way ■ Persistence - we can save and load entire pipelines for future usages Putting it all together How was it done? Data Preparation Feature Engineering Vector Assembler Model Preparation Training Transformation
  • 42. 42 Old Situation: ● Offer scores and ranks were set by old manual parameters and configurations, therefore does not reflect the real performance New Solution: ● A ML model that automatically ranks the relevant offers based on their attributes, and therefore able to estimate the relevant “target” a lot better (i.e. “best in geo XYZ”, “best in application 123”, etc.) CTR Prediction (Offer Wall)
  • 43. 43 Pipeline Preparation How was it done? Feature Engineering Vector Assembler Training Transformation Data Preparation String Indexing One Hot Encoding Model Preparation
  • 44. 44 Data Pre-processing ● Raw data - the raw events are the most important as those can help you to be familiar with the data, the trends, the outliers, and contains tons of “magic” Timestamp Advertisement_Id Application_Id User_country_code Event 2019-09-03 2:24:59 1296881 40647 US Impression 2019-09-03 2:25:03 1303994 40647 US Impression 2019-09-03 2:25:38 1288117 110226 US Impression 2019-09-03 2:25:39 1303946 119548 SA Impression 2019-09-03 2:25:58 1252617 106241 KG Impression How was it done? Feature Engineering Vector Assembler Training Transformation Data Preparation String Indexing One Hot Encoding Model Preparation
  • 45. 45 ● Insights from the raw data - ○ As every “live-traffic” product, there are tons of “long tail” applications and advertisements that doesn’t generate significant data - should focus (“OOV”, for example) ○ Timestamp is our “master” for time-series data, as we can analyze and see what were the KPIs on some specific time - should focus ○ As we’re dealing with CTR (i.e. clicks / imps), we don’t focus on outliers (i.e. when clicks > imps) - should investigate and choose next steps ○ Special cases where dimensions that are not relevant to that product are being ingested to raw data - should investigate and choose next steps Data Pre-processing How was it done? Feature Engineering Vector Assembler Training Transformation Data Preparation String Indexing One Hot Encoding Model Preparation
  • 46. 46 ● Feature Engineering ○ Window Functions - those are used to calculate KPIs (and specifically - impressions, clicks) for several times during the training dataset ○ GroupBy queries - in order to have “different” insights of label (CTR) - CTR by n dimensions is somehow related to CTR by n - 1, n - 2, …, dimensions ○ Normalization - this technique was used as one of our main features didn’t have the same scale of values. Some values had 100-1000, others had 0-1, and we had to scale them down. The goal was to have a value between 0 to . Normalization was done using MinMax normalization, though there are other techniques as well (with STD, mean,...) Feature Engineering How was it done? Feature Engineering Vector Assembler Training Transformation Data Preparation String Indexing One Hot Encoding Model Preparation
  • 47. 47 Model preparation How was it done? XGBoostRegressor Feature Engineering Vector Assembler Training Transformation Data Preparation String Indexing One Hot Encoding Model Preparation
  • 48. 48 XGBoostRegressor ● “Missing” - this is a representation for different strategies to handle missing values as part of your dataset More info can be found here, and official PDF here ● Ignore missing values ● “setHandleInvalid = skip” in VectorAssembler ● Handle missing values ● Change data accordingly (fill.na = $someNumber) ● “setHandleInvalid = keep” in VectorAssembler ● Add “missing = $someNumber” in XGBoostRegressor Model preparation How was it done? Feature Engineering Vector Assembler Training Transformation Data Preparation String Indexing One Hot Encoding Model Preparation
  • 50. 50 XGBoost Feature Importance (via Information Gain) Understand how good / bad you’re features are, and act accordingly