SlideShare a Scribd company logo
Project Bank Marketing
By: Rupa Dutta
Gautam Buddha - A philosopher and a thinker
Legends have it that he
obtained enlightenment
sitting under a tree and
advocated to the world
a new philosophy that
is called..
The Middle Path
Underfitting model Overfitting model
In the world of analytics, we are often faced with the challenge to avoid under-fitting and overfitting
models and find a balance. A balanced model has better chances of working for previously unseen
data. I would like to present this Data mining project as a quest for finding a reasonable model that
fares well against all matrices i.e. finding that “middle-path”
Optimalmodel
Critical to avoiding an
under-Fitting model is:
• Gather enough but not too much data
• Identify and get rid of noise and outliers
• Remove irrelevant features - can confuse models
• Massage the data well
• Identify nominal , ordinal and continuous feature
• Condition the features before feeding to models
Critical to avoiding an
overfitting model is:
• Test , test and more test
• Cross validate models against different mix
• Weigh against multiple performance matrices
• If possible, test against real time unseen data
Let’s get started…
Business problem at hand
Feature analysis - interesting observation
Feature selection and transformation
Model building and evaluation
Conclusion
Business problem at hand
What we have
Data gathered from recent campaign by a bank
Campaign was about getting people to sign up for term deposits
We have customer information along with information whether those
customer signed up for the term deposit
What we want
A machine learning model that can tell if a new customer is likely to
sign up for term deposit
Feature analysis -
interesting observation
Feature Analysis
Will a new customer sign up for term deposit?
Strong indicator for yes.. Previous outcome
65 % who previously said yes said yes again!!!
Although a lot of outcomes were unknown, still a good feature
0
17.5
35
52.5
70
Previously said yes said no
%whoSignedUpforTermDeposit
Feature Analysis
Will a new customer sign up for term deposit?
Strong indicator for yes.. Housing Loan
20 % of those who did not have a housing loan said yes!!!
0
5.5
11
16.5
22
No Housing Loan Housing Loan
%whoSignedUpforTermDeposit
Feature Analysis
Will a new customer sign up for term deposit?
Strong indicator for yes.. Loan Default
13% of those who had no loan default said yes.
Nobody with loan default said yes - type of info that classification algorithms can use
0
3.5
7
10.5
14
No Loan Default Loan Default
%whoSignedUpforTermDeposit
Feature Analysis
Will a new customer sign up for term deposit?
Moderate indicator for yes.. Age
Percentage almost constant across wide range - not much of a differentiating factor
% who Signed Up for Term Deposit
21
24
27
30
33
36
39
42
45
48
51
54
57
60
0 10 20 30 40
Feature selection and
transformation
Like a country is only as good as it’s people, a model
is only as good as quality of input data
Feature selection and transformation
Feature Selection table
Feature Description Pre-processing
age Continuous None
job Categorical Converted to Binary Matrix
marital status Categorical Converted to Binary Matrix
education Categorical
Converted to ordinal. 1 = primary,
2 = secondary, 3 = Tertiary
has credit in default?average yearly balance Continuous Numerically scaled
contact communication mode Categorical Discarded, feature irrelevant
last contact day of the month Categorical Discarded, feature irrelevant
last contact month of year Categorical Discarded, feature irrelevant
last contact duration Continuous Numerically scaled
Feature selection and transformation
Feature Selection table
Feature Description Pre-processing
number of contacts performed
during this campaign
Continuous Numerically scaled
number of days that passed by
after the client was last
contacted
Continuous Weak feature, discarded
marital status Categorical Converted to Binary Matrix
outcome of the previous
marketing campaign
YES/NO Converted to Binary
has credit in default? YES/NO Converted to Binary
has housing loan? YES/NO Converted to Binary
has personal loan? YES/NO Converted to Binary
Feature selection and transformation
Special mention about pre-processing done on education -
Analysis showed that higher the education level, more are the chances of a person signing up for term
deposit. Converting education to a binary matrix would have caused this information to be lost.
Therefore, the categories were manually converted to numerical scale of 1,2 and 3 with 1 = primary
and 3 = tertiary
0
3.5
7
10.5
14
Primary Secondary Tertiary
%whoSignedUpforTermDeposit
Feature selection and transformation
The special processing of “education” feature improved MCC score of several algorithm, specially of
gradient descent and AdaBoost that rely heavily on previous errors
0.38
0.39
0.4
0.41
0.42
Gradiant Descent AdaBoost
MCCscores
Model building and evaluation
Model building and evaluation
Choice of models - ensemble models
Random Forest - everyone’s favourite
An ensemble model that combines decision trees
Parameters used
Depth = 5
No of classifiers = 100
AdaBoost - acclaimed
Developed in 2003, it is considered one of the
Best out-of-the box classifier. Combines several
Weak algorithms and learns from mistakes. .
Less susceptible to overfitting
Model building and evaluation
Choice of models - non- ensemble models - linear models worked well on the data!
Model building and evaluation
Matthews Correlation Coefficient scores of each model
Moderate
Strong
Any model with MCC score greater
then 0.40 is considered strong.
According to stats, 4 different models
qualify, with gradient descent scoring
the most. Does it mean Gradient Descent
Is the right choice? Is it a good fit?
The real question is: does it overfit?
Gradient Descent
AdaBoost
Regression
Neural Net
G
radientD
escent
AdaBoost
R
egression
N
euralN
et
Model building and evaluation
Let’s seek the answer using evaluation metrics from 5 fold cross validation
5-fold cross validation - Matrix Accuracy
Gradient Descent
AdaBoost
Regression
Neural Net
Gradient Descent
AdaBoost
Regression
Neural Net
5-fold cross validation - Matrix ROC score
Model building and evaluation
Preferred Model
AdaBoost
MCC Score = 0.41 Accuracy = 90% ROC score = 0.88
Conclusion
Conclusion
linear ensemble models fitted well
With more effort, a better relationship of the features can be gleaned. For
example, marital status is strongly related to financial position. Such
information can help improve the models further
Quest for an optimal model demonstrated that cross validation is an quite an
useful strategy that can not only save time in testing but also assist in making
a better choice of model
In real world scenario, won’t harm to test all 4 top models on unseen data
May the light of Buddha’s wisdom be shown
on all of us and guide us towards good fitting
models.
Final Thoughts ….

More Related Content

What's hot (19)

PPTX
Deliveinrg explainable AI
Gary Allemann
 
PPT
TURF Analysis
QuestionPro
 
PDF
Advanced analytics proposal review guide
Eddy Ti
 
PPTX
The IoT Academy training part3 AI model
The IOT Academy
 
PPTX
Basics of AB testing in online products
Ashish Dua
 
PDF
Talent Week presentation - Sarah Marrs
Qualtrics
 
PPTX
Ab testing 101
Ashish Dua
 
PPTX
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced Analytics
Promotable
 
PPTX
Sentiment analysis presentation
GunjanSrivastava23
 
PDF
Scientific revenue unreasonable effectiveness of data
William Grosso
 
PDF
Learn How to Make Machine Learning Work
iTrainMalaysia1
 
DOCX
Business statistics done
smumbahelp
 
PDF
Missing values in recommender models
Parmeshwar Khurd
 
PPTX
Machine Learning Basics using Azure ML
Karthikeyan VK
 
PPTX
Analytical think and quantitative reasoning
Lijo Tom Jose Vattamala
 
PPT
Perception Analyzer Overview
mdulle
 
PDF
Explainable AI
Dinesh V
 
PDF
Explainable AI
Equifax Ltd
 
PPTX
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...
VWO
 
Deliveinrg explainable AI
Gary Allemann
 
TURF Analysis
QuestionPro
 
Advanced analytics proposal review guide
Eddy Ti
 
The IoT Academy training part3 AI model
The IOT Academy
 
Basics of AB testing in online products
Ashish Dua
 
Talent Week presentation - Sarah Marrs
Qualtrics
 
Ab testing 101
Ashish Dua
 
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced Analytics
Promotable
 
Sentiment analysis presentation
GunjanSrivastava23
 
Scientific revenue unreasonable effectiveness of data
William Grosso
 
Learn How to Make Machine Learning Work
iTrainMalaysia1
 
Business statistics done
smumbahelp
 
Missing values in recommender models
Parmeshwar Khurd
 
Machine Learning Basics using Azure ML
Karthikeyan VK
 
Analytical think and quantitative reasoning
Lijo Tom Jose Vattamala
 
Perception Analyzer Overview
mdulle
 
Explainable AI
Dinesh V
 
Explainable AI
Equifax Ltd
 
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...
VWO
 

Similar to Data mining - Machine Learning (20)

PPT
Business Intelligence Using SAS Final Presentation
Jodi Liu
 
PPTX
Machine Learning Project - 1994 U.S. Census
Tim Enalls
 
PPTX
Wooing the Best Bank Deposit Customers
Lucinda Linde
 
PDF
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET Journal
 
PPTX
Lecturesdflsdfkdsjkjkljsdgfjdgfjlsdgffd.pptx
JitenderKaushal2
 
PPTX
Bank Customer Churn Prediction- Saurav Singh.pptx
Boston Institute of Analytics
 
DOCX
Group7_Datamining_Project_Report_Final
Manikandan Sundarapandian
 
DOCX
Credit Card Marketing Classification Trees Fr.docx
ShiraPrater50
 
PDF
Churn in the Telecommunications Industry
skewdlogix
 
PPT
inmlk;lklkjlk;lklkjlklkojhhkljkbjlkjhbtroDM.ppt
JITENDER773791
 
PPTX
Predicting Digital Marketing Success: Conversion Forecasting Strategies
Boston Institute of Analytics
 
PPT
introDMintroDMintroDMintroDMintroDMintroDM.ppt
DEEPAK948083
 
PPT
introDM.ppt
Arumugam Prakash
 
PDF
Improving customer insight through prediction models
Alessandro Leona
 
PDF
The Data Science Process
Vishal Patel
 
PDF
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
ijaia
 
PDF
Machine Learning in Customer Analytics
Course5i
 
PPTX
Portuguese Bank - Direct Marketing Campaign
Rehan Akhtar
 
PDF
Machine learning project
BabatundeSogunro
 
PDF
direct marketing in banking using data mining
Hossein Malekinezhad
 
Business Intelligence Using SAS Final Presentation
Jodi Liu
 
Machine Learning Project - 1994 U.S. Census
Tim Enalls
 
Wooing the Best Bank Deposit Customers
Lucinda Linde
 
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET Journal
 
Lecturesdflsdfkdsjkjkljsdgfjdgfjlsdgffd.pptx
JitenderKaushal2
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Boston Institute of Analytics
 
Group7_Datamining_Project_Report_Final
Manikandan Sundarapandian
 
Credit Card Marketing Classification Trees Fr.docx
ShiraPrater50
 
Churn in the Telecommunications Industry
skewdlogix
 
inmlk;lklkjlk;lklkjlklkojhhkljkbjlkjhbtroDM.ppt
JITENDER773791
 
Predicting Digital Marketing Success: Conversion Forecasting Strategies
Boston Institute of Analytics
 
introDMintroDMintroDMintroDMintroDMintroDM.ppt
DEEPAK948083
 
introDM.ppt
Arumugam Prakash
 
Improving customer insight through prediction models
Alessandro Leona
 
The Data Science Process
Vishal Patel
 
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
ijaia
 
Machine Learning in Customer Analytics
Course5i
 
Portuguese Bank - Direct Marketing Campaign
Rehan Akhtar
 
Machine learning project
BabatundeSogunro
 
direct marketing in banking using data mining
Hossein Malekinezhad
 
Ad

Recently uploaded (20)

PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
DATA-COLLECTION METHODS, TYPES AND SOURCES
biggdaad011
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
DOC
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PDF
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PPTX
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PPT
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PDF
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
DATA-COLLECTION METHODS, TYPES AND SOURCES
biggdaad011
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
AI/ML Applications in Financial domain projects
Rituparna De
 
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
Data base management system Transactions.ppt
gandhamcharan2006
 
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
Ad

Data mining - Machine Learning

  • 2. Gautam Buddha - A philosopher and a thinker
  • 3. Legends have it that he obtained enlightenment sitting under a tree and advocated to the world a new philosophy that is called..
  • 5. Underfitting model Overfitting model In the world of analytics, we are often faced with the challenge to avoid under-fitting and overfitting models and find a balance. A balanced model has better chances of working for previously unseen data. I would like to present this Data mining project as a quest for finding a reasonable model that fares well against all matrices i.e. finding that “middle-path” Optimalmodel
  • 6. Critical to avoiding an under-Fitting model is: • Gather enough but not too much data • Identify and get rid of noise and outliers • Remove irrelevant features - can confuse models • Massage the data well • Identify nominal , ordinal and continuous feature • Condition the features before feeding to models
  • 7. Critical to avoiding an overfitting model is: • Test , test and more test • Cross validate models against different mix • Weigh against multiple performance matrices • If possible, test against real time unseen data
  • 8. Let’s get started… Business problem at hand Feature analysis - interesting observation Feature selection and transformation Model building and evaluation Conclusion
  • 9. Business problem at hand What we have Data gathered from recent campaign by a bank Campaign was about getting people to sign up for term deposits We have customer information along with information whether those customer signed up for the term deposit What we want A machine learning model that can tell if a new customer is likely to sign up for term deposit
  • 11. Feature Analysis Will a new customer sign up for term deposit? Strong indicator for yes.. Previous outcome 65 % who previously said yes said yes again!!! Although a lot of outcomes were unknown, still a good feature 0 17.5 35 52.5 70 Previously said yes said no %whoSignedUpforTermDeposit
  • 12. Feature Analysis Will a new customer sign up for term deposit? Strong indicator for yes.. Housing Loan 20 % of those who did not have a housing loan said yes!!! 0 5.5 11 16.5 22 No Housing Loan Housing Loan %whoSignedUpforTermDeposit
  • 13. Feature Analysis Will a new customer sign up for term deposit? Strong indicator for yes.. Loan Default 13% of those who had no loan default said yes. Nobody with loan default said yes - type of info that classification algorithms can use 0 3.5 7 10.5 14 No Loan Default Loan Default %whoSignedUpforTermDeposit
  • 14. Feature Analysis Will a new customer sign up for term deposit? Moderate indicator for yes.. Age Percentage almost constant across wide range - not much of a differentiating factor % who Signed Up for Term Deposit 21 24 27 30 33 36 39 42 45 48 51 54 57 60 0 10 20 30 40
  • 15. Feature selection and transformation Like a country is only as good as it’s people, a model is only as good as quality of input data
  • 16. Feature selection and transformation Feature Selection table Feature Description Pre-processing age Continuous None job Categorical Converted to Binary Matrix marital status Categorical Converted to Binary Matrix education Categorical Converted to ordinal. 1 = primary, 2 = secondary, 3 = Tertiary has credit in default?average yearly balance Continuous Numerically scaled contact communication mode Categorical Discarded, feature irrelevant last contact day of the month Categorical Discarded, feature irrelevant last contact month of year Categorical Discarded, feature irrelevant last contact duration Continuous Numerically scaled
  • 17. Feature selection and transformation Feature Selection table Feature Description Pre-processing number of contacts performed during this campaign Continuous Numerically scaled number of days that passed by after the client was last contacted Continuous Weak feature, discarded marital status Categorical Converted to Binary Matrix outcome of the previous marketing campaign YES/NO Converted to Binary has credit in default? YES/NO Converted to Binary has housing loan? YES/NO Converted to Binary has personal loan? YES/NO Converted to Binary
  • 18. Feature selection and transformation Special mention about pre-processing done on education - Analysis showed that higher the education level, more are the chances of a person signing up for term deposit. Converting education to a binary matrix would have caused this information to be lost. Therefore, the categories were manually converted to numerical scale of 1,2 and 3 with 1 = primary and 3 = tertiary 0 3.5 7 10.5 14 Primary Secondary Tertiary %whoSignedUpforTermDeposit
  • 19. Feature selection and transformation The special processing of “education” feature improved MCC score of several algorithm, specially of gradient descent and AdaBoost that rely heavily on previous errors 0.38 0.39 0.4 0.41 0.42 Gradiant Descent AdaBoost MCCscores
  • 20. Model building and evaluation
  • 21. Model building and evaluation Choice of models - ensemble models Random Forest - everyone’s favourite An ensemble model that combines decision trees Parameters used Depth = 5 No of classifiers = 100 AdaBoost - acclaimed Developed in 2003, it is considered one of the Best out-of-the box classifier. Combines several Weak algorithms and learns from mistakes. . Less susceptible to overfitting
  • 22. Model building and evaluation Choice of models - non- ensemble models - linear models worked well on the data!
  • 23. Model building and evaluation Matthews Correlation Coefficient scores of each model Moderate Strong Any model with MCC score greater then 0.40 is considered strong. According to stats, 4 different models qualify, with gradient descent scoring the most. Does it mean Gradient Descent Is the right choice? Is it a good fit? The real question is: does it overfit? Gradient Descent AdaBoost Regression Neural Net G radientD escent AdaBoost R egression N euralN et
  • 24. Model building and evaluation Let’s seek the answer using evaluation metrics from 5 fold cross validation 5-fold cross validation - Matrix Accuracy Gradient Descent AdaBoost Regression Neural Net Gradient Descent AdaBoost Regression Neural Net 5-fold cross validation - Matrix ROC score
  • 25. Model building and evaluation Preferred Model AdaBoost MCC Score = 0.41 Accuracy = 90% ROC score = 0.88
  • 27. Conclusion linear ensemble models fitted well With more effort, a better relationship of the features can be gleaned. For example, marital status is strongly related to financial position. Such information can help improve the models further Quest for an optimal model demonstrated that cross validation is an quite an useful strategy that can not only save time in testing but also assist in making a better choice of model In real world scenario, won’t harm to test all 4 top models on unseen data
  • 28. May the light of Buddha’s wisdom be shown on all of us and guide us towards good fitting models. Final Thoughts ….