SlideShare a Scribd company logo
IDENTIFYING CUSTOMER POTENTIALS – PERFORMING
UNSUPERVISED LEARNING
Conrad Kleinn 05.02.2019
AGENDA
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 2
• DATA SCIENCE – CROSS INDUSTRY STANDARD PROCESS (CRISP)
• BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES
• DATA UNDERSTANDING – REALITY AND DATA MODEL
• DATA PREPARATION – ENGINEERING FEATURES
• MODELING – APPLYING USEFUL ALGORITHM
• EVALUATION – DOES IT MAKE SENSE?
• DEPLOYMENT – UTILIZING THE RESULTS
• OUTLOOK
DATA SCIENCE – CROSS INDUSTRY STANDARD
PROCESS (CRISP)
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 3
BUSINESS UNDERSTANDING – CONTEXT AND
OBJECTIVES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 4
Customers are different and have different needs and expactations. Therefore it is a
core demand of marketing & sales to find a good differentiation of their customer
base. This enables marketing & sales to perfectly fit advertising, campaigns and
sales activities to their specific customer groups.
We want to identify useful clusters from our data. Since we dont have labels to
learn from we need to use an unsupervised approach.
BUSINESS UNDERSTANDING – CONTEXT AND
OBJECTIVES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 5
Criterias given and agreed by business:
› There is information about revenue und employees for at least 2017
› The considered companys have a minimum of 200 employees
› „Company“ means a consolidation circle („Konzern“)
› About 14k observations meet this criteria
DATA UNDERSTANDING – REALITY AND DATA
MODEL
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 6
• Companys & „Konzerne“ are represented by customer masterdata -> Golden
Record from SAP & Bedirect
• Customer masterdata include hierachical information representing „Konzern“
structures
• Employee and revenue information can be aggregated within the hierarchy
• There are corresponding data on contracts and billing from SAP. They are a
representation of the business relationship with haufe.
DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 7
General information on data preparation
› Imputation -> „Filling gaps“
› Aggregation level
› Aggregation method (mean, stddev, sum, min, max…)
› Treating outliers -> Boxplots
› Standardization (z-transformation, minmax)
› Create patterns (eg. binary)
› Categorical variables will be transformed to numerical dummy variables
DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 8
Required mathematical expression (in case of unsupervised learning)
DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 9
Creating a pattern variable representing the revenue development over the last 3
years.
› When applying the tripartite system a vektor can be created that shows for
each year if the revenue dropped, remained or rised.
› When converting to decimal and summarizing the years we will have a „handy“
variable representing revenue development.
› We do the same for number of employess and
billing information
DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 10
Creating a pattern variables representing the value of the subscription portfolios
› We apply a binary system based on hierachical productinformation from
business.
› We summarize as decimal
MODELING – APPLYING USEFUL ALGORITHM
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 11
Since we don‘t have any given labels to learn from we need to perform a
unsupervised learning.
We choose K-MEANS as algorithm
MODELING – APPLYING USEFUL ALGORITHM
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 12
We choose k==4 startingpoints
MODELING – APPLYING USEFUL ALGORITHM
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 13
EVALUATION – DOES IT MAKE SENSE?
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 14
Describing the calculated clusters
EVALUATION – DOES IT MAKE SENSE?
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 15
Naming the calculated clusters
DEPLOYMENT – UTILIZING THE RESULTS
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 16
Saleslists -> Already tested succesfully at channel sales (Michel Lason)
FKS (Firmen-Konzern-Sicht)
Chordiant
Other campaigntools
Customer Service
OUTLOOK
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 17
Using calculated clusters as target variables for discriminant analysis and/or
decision tree.
> Predict Clusters
> „Handy“ Formula
> Better understanding of clusters (significance, exogene variables,
separation points etc.)
Using cosine distance for identifying similaritys between vectors

More Related Content

Similar to Identifying customer potentials through unsupervised learning (20)

PPTX
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
PDF
Exploring the Data science Process
Vishal Patel
 
PDF
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Databricks
 
PDF
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Wei Di
 
PDF
Data Driven Approach: Advanced Analytics & Machine Learning in the Non Profit...
Data Driven Innovation
 
PDF
Transforming B2B Sales with Spark Powered Sales Intelligence
Songtao Guo
 
PPTX
PPT1-Buss Intel Analytics.pptx
ssuser28b150
 
PPTX
Using Machine Learning & Spark to Power Data-Driven Marketing
Caserta
 
PDF
Big Data for the Retail Business I Swan Insights I Solvay Business School
Laurent Kinet
 
PPTX
kpmgvirtualinternshiptask2-230102104948-82b2caa3.pptx
andre241421
 
PDF
Gabor Koncz – AI in email marketing: email conversion optimization in eCommerce
Emailing 2020
 
PPTX
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Institute of Contemporary Sciences
 
PPTX
6 Steps to Become a Data-Driven Company
BrainSell Technologies
 
PPTX
Busyness Analytics Slides
AvinabaHandson
 
PPT
1.2 steps and functionalities
Rajendran
 
PPT
1.2 steps and functionalities
Krish_ver2
 
PPTX
p-245 customer personality.pptx
Anupama Kate
 
DOCX
Group7_Datamining_Project_Report_Final
Manikandan Sundarapandian
 
PDF
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
Data Science Club
 
PPTX
KPMG Virtual Internship Task 2.pptx
VIDHIYA S B
 
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
Exploring the Data science Process
Vishal Patel
 
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Databricks
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Wei Di
 
Data Driven Approach: Advanced Analytics & Machine Learning in the Non Profit...
Data Driven Innovation
 
Transforming B2B Sales with Spark Powered Sales Intelligence
Songtao Guo
 
PPT1-Buss Intel Analytics.pptx
ssuser28b150
 
Using Machine Learning & Spark to Power Data-Driven Marketing
Caserta
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Laurent Kinet
 
kpmgvirtualinternshiptask2-230102104948-82b2caa3.pptx
andre241421
 
Gabor Koncz – AI in email marketing: email conversion optimization in eCommerce
Emailing 2020
 
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Institute of Contemporary Sciences
 
6 Steps to Become a Data-Driven Company
BrainSell Technologies
 
Busyness Analytics Slides
AvinabaHandson
 
1.2 steps and functionalities
Rajendran
 
1.2 steps and functionalities
Krish_ver2
 
p-245 customer personality.pptx
Anupama Kate
 
Group7_Datamining_Project_Report_Final
Manikandan Sundarapandian
 
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
Data Science Club
 
KPMG Virtual Internship Task 2.pptx
VIDHIYA S B
 

More from Haufe-Lexware GmbH & Co KG (20)

PDF
Tech stackhaufegroup
Haufe-Lexware GmbH & Co KG
 
PPTX
X-celerate 2019: Iterating fast with the MERN Stack
Haufe-Lexware GmbH & Co KG
 
PDF
Nils Rhode - Does it always have to be k8s - TeC Day 2019
Haufe-Lexware GmbH & Co KG
 
PDF
Haufe Onboarding - Fast Iterating With the MERN Stack - TEC Day 2019
Haufe-Lexware GmbH & Co KG
 
PPTX
Cloud Journey: Lifting a Major Product to Kubernetes
Haufe-Lexware GmbH & Co KG
 
PPTX
ONA ( organizational network analysis ) to enable individuals to impact their...
Haufe-Lexware GmbH & Co KG
 
PPTX
ONA ( organizational network analysis ) enabling individuals to impact their ...
Haufe-Lexware GmbH & Co KG
 
PPTX
Using word vectors to enable better search in our legal products
Haufe-Lexware GmbH & Co KG
 
PPTX
Field report: Rapid application development
Haufe-Lexware GmbH & Co KG
 
PPTX
Behavior-Driven Development with JGiven
Haufe-Lexware GmbH & Co KG
 
PPTX
Externalized Spring Boot App Configuration
Haufe-Lexware GmbH & Co KG
 
PPTX
Managing short lived Kubernetes (Production) deployments
Haufe-Lexware GmbH & Co KG
 
PDF
Docker in Production at the Aurora Team
Haufe-Lexware GmbH & Co KG
 
PPTX
DevOps Journey of Foundational Services at Haufe
Haufe-Lexware GmbH & Co KG
 
PPTX
New Serverless World - Cloud Native Apps
Haufe-Lexware GmbH & Co KG
 
PPTX
Microservice Transformation of the Haufe Publishing System
Haufe-Lexware GmbH & Co KG
 
PPTX
Haufe API Strategy
Haufe-Lexware GmbH & Co KG
 
PPTX
Haufe's Tech Strategy In Practice
Haufe-Lexware GmbH & Co KG
 
PPTX
Kubernetes Intro @HaufeDev
Haufe-Lexware GmbH & Co KG
 
PPTX
API Management with wicked.haufe.io
Haufe-Lexware GmbH & Co KG
 
Tech stackhaufegroup
Haufe-Lexware GmbH & Co KG
 
X-celerate 2019: Iterating fast with the MERN Stack
Haufe-Lexware GmbH & Co KG
 
Nils Rhode - Does it always have to be k8s - TeC Day 2019
Haufe-Lexware GmbH & Co KG
 
Haufe Onboarding - Fast Iterating With the MERN Stack - TEC Day 2019
Haufe-Lexware GmbH & Co KG
 
Cloud Journey: Lifting a Major Product to Kubernetes
Haufe-Lexware GmbH & Co KG
 
ONA ( organizational network analysis ) to enable individuals to impact their...
Haufe-Lexware GmbH & Co KG
 
ONA ( organizational network analysis ) enabling individuals to impact their ...
Haufe-Lexware GmbH & Co KG
 
Using word vectors to enable better search in our legal products
Haufe-Lexware GmbH & Co KG
 
Field report: Rapid application development
Haufe-Lexware GmbH & Co KG
 
Behavior-Driven Development with JGiven
Haufe-Lexware GmbH & Co KG
 
Externalized Spring Boot App Configuration
Haufe-Lexware GmbH & Co KG
 
Managing short lived Kubernetes (Production) deployments
Haufe-Lexware GmbH & Co KG
 
Docker in Production at the Aurora Team
Haufe-Lexware GmbH & Co KG
 
DevOps Journey of Foundational Services at Haufe
Haufe-Lexware GmbH & Co KG
 
New Serverless World - Cloud Native Apps
Haufe-Lexware GmbH & Co KG
 
Microservice Transformation of the Haufe Publishing System
Haufe-Lexware GmbH & Co KG
 
Haufe API Strategy
Haufe-Lexware GmbH & Co KG
 
Haufe's Tech Strategy In Practice
Haufe-Lexware GmbH & Co KG
 
Kubernetes Intro @HaufeDev
Haufe-Lexware GmbH & Co KG
 
API Management with wicked.haufe.io
Haufe-Lexware GmbH & Co KG
 
Ad

Recently uploaded (20)

PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Research Methodology Overview Introduction
ayeshagul29594
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Ad

Identifying customer potentials through unsupervised learning

  • 1. IDENTIFYING CUSTOMER POTENTIALS – PERFORMING UNSUPERVISED LEARNING Conrad Kleinn 05.02.2019
  • 2. AGENDA 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 2 • DATA SCIENCE – CROSS INDUSTRY STANDARD PROCESS (CRISP) • BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES • DATA UNDERSTANDING – REALITY AND DATA MODEL • DATA PREPARATION – ENGINEERING FEATURES • MODELING – APPLYING USEFUL ALGORITHM • EVALUATION – DOES IT MAKE SENSE? • DEPLOYMENT – UTILIZING THE RESULTS • OUTLOOK
  • 3. DATA SCIENCE – CROSS INDUSTRY STANDARD PROCESS (CRISP) 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 3
  • 4. BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 4 Customers are different and have different needs and expactations. Therefore it is a core demand of marketing & sales to find a good differentiation of their customer base. This enables marketing & sales to perfectly fit advertising, campaigns and sales activities to their specific customer groups. We want to identify useful clusters from our data. Since we dont have labels to learn from we need to use an unsupervised approach.
  • 5. BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 5 Criterias given and agreed by business: › There is information about revenue und employees for at least 2017 › The considered companys have a minimum of 200 employees › „Company“ means a consolidation circle („Konzern“) › About 14k observations meet this criteria
  • 6. DATA UNDERSTANDING – REALITY AND DATA MODEL 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 6 • Companys & „Konzerne“ are represented by customer masterdata -> Golden Record from SAP & Bedirect • Customer masterdata include hierachical information representing „Konzern“ structures • Employee and revenue information can be aggregated within the hierarchy • There are corresponding data on contracts and billing from SAP. They are a representation of the business relationship with haufe.
  • 7. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 7 General information on data preparation › Imputation -> „Filling gaps“ › Aggregation level › Aggregation method (mean, stddev, sum, min, max…) › Treating outliers -> Boxplots › Standardization (z-transformation, minmax) › Create patterns (eg. binary) › Categorical variables will be transformed to numerical dummy variables
  • 8. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 8 Required mathematical expression (in case of unsupervised learning)
  • 9. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 9 Creating a pattern variable representing the revenue development over the last 3 years. › When applying the tripartite system a vektor can be created that shows for each year if the revenue dropped, remained or rised. › When converting to decimal and summarizing the years we will have a „handy“ variable representing revenue development. › We do the same for number of employess and billing information
  • 10. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 10 Creating a pattern variables representing the value of the subscription portfolios › We apply a binary system based on hierachical productinformation from business. › We summarize as decimal
  • 11. MODELING – APPLYING USEFUL ALGORITHM 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 11 Since we don‘t have any given labels to learn from we need to perform a unsupervised learning. We choose K-MEANS as algorithm
  • 12. MODELING – APPLYING USEFUL ALGORITHM 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 12 We choose k==4 startingpoints
  • 13. MODELING – APPLYING USEFUL ALGORITHM 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 13
  • 14. EVALUATION – DOES IT MAKE SENSE? 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 14 Describing the calculated clusters
  • 15. EVALUATION – DOES IT MAKE SENSE? 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 15 Naming the calculated clusters
  • 16. DEPLOYMENT – UTILIZING THE RESULTS 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 16 Saleslists -> Already tested succesfully at channel sales (Michel Lason) FKS (Firmen-Konzern-Sicht) Chordiant Other campaigntools Customer Service
  • 17. OUTLOOK 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 17 Using calculated clusters as target variables for discriminant analysis and/or decision tree. > Predict Clusters > „Handy“ Formula > Better understanding of clusters (significance, exogene variables, separation points etc.) Using cosine distance for identifying similaritys between vectors