SlideShare a Scribd company logo
STUDIERENUND DURCHSTARTEN.Author I:	Dip.-Inf. (FH) Johannes HoppeAuthor II:	M.Sc. Johannes HofmeisterAuthor III:	Prof. Dr. Dieter HomeisterDate:	01.04.201108.04.201115.04.2011
Data Mining AppliedAuthor I:	Dip.-Inf. (FH) Johannes HoppeAuthor II:	M.Sc. Johannes HofmeisterAuthor III:	Prof. Dr. Dieter HomeisterDate:	01.04.201108.04.201115.04.2011
01Applications of Data Mining3
Applicationsof Data Mining4
Applicationsof Data MiningApplications of Data MiningDatabase Marketing Time-series prediction, detecting "trends" Detection (of whatever is detectable)Probability Estimation Information compression Sensitivity Analysis 5
Applicationsof Data MiningDatabase Marketing(1/2)Response modelingModel for the response of specific customers. Systematic selection of (old and potential) customers. Advertisements and promotion based on these results. ( CRM)Visualization: "Lift chart" shows how successful the selection should be. (later topic: DM validation)6
Lift Chart Example“For contacting 10% of customers, using no model we should get 10% of responders and using the given model we should get 30% of responders.”7
Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersQuestion: Which customer might buy which other product?Uses historical purchase data Uses credit card information, lifestyle data, demographic data, etc. Other possible information: Did the customer query special information? How customer heard of the company? 8
Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersResults for direct marketing, mailing lists, direct advertising (Amazon) Amazon: "Customers who bought this item also bought" and "personalized recommendations" 9
Applicationsof Data MiningTime-series predictionTime series: Stock prices, market shares, … Extrapolation of future values Detection of newly arising trends like customer movements to other productsOwn experience: German print magazines 10
Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Fraud detection: Identifying patterns/criteria to detect credit card fraud Estimating creditworthiness ( German Schufa) Prediction of mail orders that will not be paid 11
Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Intrusion detection (in computer networks) Find patterns that indicate when an attackis made on an network e.g. clustering: small clusters are of high interest,they point to unusual cases.Definition of Classes may be useful:e.g. harmless, possible harmful,harmful, immediately close LAN 12
Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition  Typical difficultiesNeeds knowledgeDM costs Cost of missing a fraud Cost of false positives(e.g. falsely accusing someone of fraud, company image problems)13
Applicationsof Data MiningProbability EstimationApproximate the likelihood of an event given an observation e.g. for classify a potential customer into an A,B,C range before any business14
Applicationsof Data MiningInformation CompressionCan be viewed as a special type of estimation problem. For a given set of data, estimate the key components that be can be used to construct the data. 15
Applicationsof Data MiningSensitivity AnalysisUnderstand how changes in one variable affect others. Identify sensitivity of one variable on another(find out if dependencies exist). 16
02Data Mining Algorithms17
Data Mining AlgorithmsData Mining AlgorithmsDifferent algorithms, different usesCombinedThe algorithm depends on what you want to doNot every algorithm is suited for what you want to do18
Data Mining AlgorithmsAlgorithms in SSAS: GroupsClassification algorithmsRegression algorithmsAssociation algorithmsSegmentation algorithmsSequence analysis algorithmsPlug-In algorithms19
Data Mining AlgorithmsClassification algorithmsPredict discrete attributesBased on experience valuesAlgorithms in SSAS:Naive BayesDecision TreesNeural Networks20
Data Mining AlgorithmsRegression algorithmsPredict continuous attributesThe same as classification algorithmsAlgorithms in SSASLinear Regression (Line)Logistic Regression (Curve)MS Time Series21
Data Mining AlgorithmsAssociation algorithmsPredict likely combinationsFind elements that occur in combinationAlgorithms in SSAS:MS Associtation Algorithm (Apriori)22
Data Mining AlgorithmsSegmentation algorithmsAlso called „Clustering algorithms“Groups data with similar propertiesAlgorithms in SSAS:MS Clustering Algorithms (e.g. K-Means)23
Data Mining AlgorithmsSequence analysis algorithms…are clustering algorithmsConsider the sorting; the sequence of values while clusteringDoes not group by similar propertiesGroups by similar sequencesAlgorithms in SSAS: MS Sequence Clustering24
Data Mining AlgorithmsPlug-In algorithms.NET Wrapper for COM objectsUse ANY algorithmProvided as an assembly(possible workshop to create one)25
03Repetition - Datatypes, Contentypes26
Repetition - Datatypes, ContentypesApplying anAlgorithmDatatypesContenttypes27
Repetition - Datatypes, ContentypesDatatypesDefinethestructure of thevaluesAvailabledatatypes:TextLongBooleanDoubleDate28
Repetition - Datatypes, ContentypesContenttypesDefinethebehaviour of valuesDiscreteContinuousDiscretizedKeyKey SequenceKey TimeOrderedCyclical29
Repetition - Datatypes, ContentypesContenttype: DiscreteFixed set of valuesExample:Commute Distance: 1-2, 2-5, 5-10Region: Pacific, Northern America, EuropeName: … … …Boolean values are always discreteText is most likely discrete30
Repetition - Datatypes, ContentypesContenttype: ContinuousUnlimited set of valuesInfinite items possibleExampleIncomeAgeDifference between Continuous and Discrete is the most important one31
Repetition - Datatypes, ContentypesContenttype: DiscretizedContinuousvaluesconvertedintodiscretevaluesExamples:Income to Categories:A, B, C, …Age to groups:0-20,21-30, 31-40, …32
Repetition - Datatypes, ContentypesContenttype: KeyKeyUniquely identifies a rowKey Sequence (sequence clustering models)Series of eventsSortedKey Time (time series models)Identify values on a time scale33
Repetition - Datatypes, ContentypesContenttype: OrderedDiscretevaluesthathave a sorting orderNodistancesvisibleNorelationsvisible„One Star“ to „Five Stars“34
Repetition - Datatypes, ContentypesContenttype: CyclicalDiscretevaluesthathave a cyclicalsorting orderExample:Weekdays: Monday, Tuesday, … Sunday, Monday, …	1,2,3, …,7, 1, …Months	Jan, Feb, Mar, … , Dec, Jan, …	1, 2, 3, …, 12, 1, …35
Available Combinations36
04Data Mining Algorithms - Decision Trees37
Applied Data Mining - Decision Trees38
Applied Data Mining - Decision TreesIn GeneralAlso known as: Classification TreesGoal: Sequentially partition DataCan detect non-linear relationshipsMachine Learning TechniqueSeparate into Training and Testing setTraining set is created to create model based on certain criteriaTest set is used to verify the model39
Applied Data Mining - Decision TreesTree for response of a mailing actionIncome > $30 000: 3,6 %Male 3,2%(Total: 4.677)Income < $30 000: 2,3 %2,6 % respose rate(Total: 10.000 persons)Age > 40: 3,8%Female 2,1%(Total: 5,323)Age < 40: 3,2 %40
Applied Data Mining - Decision TreesUsingtheTrainedTreeExample: the management decides to mail only to groups with response rate >3.5%. TrainedTreeMales: $30 000Response Rate: > 3,5 %Female: 40+41
Applied Data Mining - Decision TreesProsVery flexible, white box ModelKiss – Keep it simple, stupid!Little preparation and resources neededConsCan be tuned until deathLong time to buildRequires wisely selected training data!False training yields false resultsBig tree might require disk swapping(Computation might be difficult if it does not fit into main memory.) 42
Project: “DMDW Mining Test”43
Project: “DMDW Mining Test”(explanation of one note)44
Project: “DMDW Mining Test”(shows connections, more useful if there are more predictable values)
Project: “DMDW Mining Test”(Generic Content Tree Viewer  DMX (Data Mining Extensions))
ReferencesReferences for Decisions TreesOlivia Parr Rud et. al, Data Mining Cookbook - Modeling Data for Marketing, Risk, and Customer Relationship Management, Wiley, 2001David A. Grossman, Ophir Frieder: Introductionto Data Mining, Illinois Institute of Technology 2005Andrew W. Moore: DecisionTrees, Carnegie Mellon University, http://www.autonlab.org/tutorials/dtree16.pdfNongYe (ed.): The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003Sushimita Mitra, TinkuAcharya, Data Mining - Multimedia, Soft Computing andBioinformatics, Wiley, 2003http://en.wikipedia.org/wiki/Classification_tree47
05Data Mining Algorithms - Clustering48
Data Mining Algorithms - ClusteringX1249
Data Mining Algorithms - ClusteringClusteringSegmentation AlgorithmFind homogenous groups within setFind similar variables for different casesIdentify new relationships that were unclear before(heuristics)e.g. „Person who rides a bike to work doesn‘t live far from his workplace“ (this is not obvious)50
51HomogeneousSubsetsIndependent VariablesDescription of classclassifyidentifyX12
52HomogeneousSubsetsIndependent VariablesDescription of class1. Clustering2. ClassificationclassifyidentifyX12
Clustering1. ClusteringReducesdatatoclasses of equaltypesBecomefriedswiththedataIterative AlgorithmClusteringValidateClassifyApplyhttp://msdn.microsoft.com/en-us/library/ms174879.aspx53
Data Mining Algorithms - Clustering2. ClassificationCreate a Description of a groupGive it a „name“Also: Characterization54
ProcessStart with random valuesReuse will create different sets and different groupsDifferent clustering technique / algorithm will create different groupReuse on same dataset, reseedExpert evaluate found classes and plausibility Good classes used for predictionsGood?1. ClusteringEvaluate, Check2. ClassifyApply(Predict)55
ClusteringMS Clustering AlgorithmCombination of two algorithmsK-Means – Hard! Datapoint can be in only one clusterExpectation Maximization – SoftDatapoint has different combinationsDatapoint belongs to different clustersProbability is calculated56Source: http://msdn.microsoft.com/en-us/library/cc280445.aspx
Clustering57ProsNo predictable variable to chooseTrains itself without much effortEasy to configure„Cons“Interpretation is everythingGood eye neededExpert has to check for plausibility
Project: “DMDW Mining Test”(strongest relations only, amount of matching cases for Region Europe)
Project: “DMDW Mining Test”(good to know: continuous attributes are shown by there arithmetic  average)
Project: “DMDW Mining Test”(comparing two clusters)
THANK YOUFOR YOUR ATTENTION61

More Related Content

What's hot (20)

PPT
5.4 mining sequence patterns in biological data
Krish_ver2
 
PPTX
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
PPTX
introduction to Data Structure and classification
chauhankapil
 
PPTX
Mining frequent patterns association
DeepaR42
 
PPTX
Machine Learning and Real-World Applications
MachinePulse
 
DOCX
mapReduce for machine learning
Pranya Prabhakar
 
PDF
Feature Importance Analysis with XGBoost in Tax audit
Michael BENESTY
 
PPT
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
PPTX
Multidimensioal database
TPO TPO
 
PPTX
Linear regression on 1 terabytes of data? Some crazy observations and actions
Hesen Peng
 
PDF
An improvised frequent pattern tree
IJDKP
 
PDF
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
IJCSIS Research Publications
 
PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
ijsrd.com
 
PPTX
What is Machine Learning
Bhaskara Reddy Sannapureddy
 
PDF
Stock Market Prediction Using ANN
Krishna Mohan Mishra
 
PPTX
An intelligent scalable stock market prediction system
Harshit Agarwal
 
PDF
Graph Tea: Simulating Tool for Graph Theory & Algorithms
IJMTST Journal
 
PDF
Machine Learning Real Life Applications By Examples
Mario Cartia
 
PPT
Lect12 graph mining
Houw Liong The
 
PDF
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
AshishDPatel1
 
5.4 mining sequence patterns in biological data
Krish_ver2
 
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
introduction to Data Structure and classification
chauhankapil
 
Mining frequent patterns association
DeepaR42
 
Machine Learning and Real-World Applications
MachinePulse
 
mapReduce for machine learning
Pranya Prabhakar
 
Feature Importance Analysis with XGBoost in Tax audit
Michael BENESTY
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Multidimensioal database
TPO TPO
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Hesen Peng
 
An improvised frequent pattern tree
IJDKP
 
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
IJCSIS Research Publications
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
ijsrd.com
 
What is Machine Learning
Bhaskara Reddy Sannapureddy
 
Stock Market Prediction Using ANN
Krishna Mohan Mishra
 
An intelligent scalable stock market prediction system
Harshit Agarwal
 
Graph Tea: Simulating Tool for Graph Theory & Algorithms
IJMTST Journal
 
Machine Learning Real Life Applications By Examples
Mario Cartia
 
Lect12 graph mining
Houw Liong The
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
AshishDPatel1
 

Viewers also liked (8)

PPTX
DMDW Lesson 01 - Introduction
Johannes Hoppe
 
PPTX
Ria 09 trends_and_technologies
Johannes Hoppe
 
PPTX
DMDW Lesson 03 - Data Warehouse Theory
Johannes Hoppe
 
PPTX
DMDW Extra Lesson - NoSql and MongoDB
Johannes Hoppe
 
PDF
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
Johannes Hoppe
 
PDF
2017 - NoSQL Vorlesung Mosbach
Johannes Hoppe
 
PPTX
NoSQL - Hands on
Johannes Hoppe
 
PPTX
Exkurs: Save the pixel
Johannes Hoppe
 
DMDW Lesson 01 - Introduction
Johannes Hoppe
 
Ria 09 trends_and_technologies
Johannes Hoppe
 
DMDW Lesson 03 - Data Warehouse Theory
Johannes Hoppe
 
DMDW Extra Lesson - NoSql and MongoDB
Johannes Hoppe
 
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
Johannes Hoppe
 
2017 - NoSQL Vorlesung Mosbach
Johannes Hoppe
 
NoSQL - Hands on
Johannes Hoppe
 
Exkurs: Save the pixel
Johannes Hoppe
 
Ad

Similar to DMDW Lesson 05 + 06 + 07 - Data Mining Applied (20)

PPTX
01 Introduction to Data Mining
Valerii Klymchuk
 
PPTX
Customer Profiling using Data Mining
Suman Chatterjee
 
PPTX
Data Mining with SQL Server 2008
Peter Gfader
 
PPTX
Data mining an introduction
Dr-Dipali Meher
 
PPT
Datamining
IssacArputharajJeyak
 
PPT
Datamining
IssacArputharajJeyak
 
PPTX
Data mining techniques
Hatem Magdy
 
PPT
Part1
sumit621
 
PPTX
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
 
PPTX
Data Mining: What is Data Mining?
Seerat Malik
 
PDF
BI Chapter 04.pdf business business business business
JawaherAlbaddawi
 
PPT
6 weeks summer training in data mining,jalandhar
deepikakaler1
 
PPT
6 weeks summer training in data mining,ludhiana
deepikakaler1
 
PPT
6months industrial training in data mining,ludhiana
deepikakaler1
 
PPT
6months industrial training in data mining, jalandhar
deepikakaler1
 
PPTX
Data Mining Application and Trends
VijayasankariS
 
PDF
turban_dss9e_Data Mining-Decision Support and Business Intelligence.pdf
ikachanz
 
PDF
Data Mining Techniques
Sanzid Kawsar
 
PPT
hanjia chapter_1.ppt data mining chapter 1
Ping261512
 
PDF
2 introductory slides
tafosepsdfasg
 
01 Introduction to Data Mining
Valerii Klymchuk
 
Customer Profiling using Data Mining
Suman Chatterjee
 
Data Mining with SQL Server 2008
Peter Gfader
 
Data mining an introduction
Dr-Dipali Meher
 
Data mining techniques
Hatem Magdy
 
Part1
sumit621
 
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
 
Data Mining: What is Data Mining?
Seerat Malik
 
BI Chapter 04.pdf business business business business
JawaherAlbaddawi
 
6 weeks summer training in data mining,jalandhar
deepikakaler1
 
6 weeks summer training in data mining,ludhiana
deepikakaler1
 
6months industrial training in data mining,ludhiana
deepikakaler1
 
6months industrial training in data mining, jalandhar
deepikakaler1
 
Data Mining Application and Trends
VijayasankariS
 
turban_dss9e_Data Mining-Decision Support and Business Intelligence.pdf
ikachanz
 
Data Mining Techniques
Sanzid Kawsar
 
hanjia chapter_1.ppt data mining chapter 1
Ping261512
 
2 introductory slides
tafosepsdfasg
 
Ad

More from Johannes Hoppe (20)

PDF
Einführung in Angular 2
Johannes Hoppe
 
PDF
MDC kompakt 2014: Hybride Apps mit Cordova, AngularJS und Ionic
Johannes Hoppe
 
PPTX
2015 02-09 - NoSQL Vorlesung Mosbach
Johannes Hoppe
 
PDF
2012-06-25 - MapReduce auf Azure
Johannes Hoppe
 
PDF
2013-06-25 - HTML5 & JavaScript Security
Johannes Hoppe
 
PDF
2013-06-24 - Software Craftsmanship with JavaScript
Johannes Hoppe
 
PDF
2013-06-15 - Software Craftsmanship mit JavaScript
Johannes Hoppe
 
PDF
2013 05-03 - HTML5 & JavaScript Security
Johannes Hoppe
 
PDF
2013-03-23 - NoSQL Spartakiade
Johannes Hoppe
 
PDF
2013 02-26 - Software Tests with Mongo db
Johannes Hoppe
 
PDF
2013-02-21 - .NET UG Rhein-Neckar: JavaScript Best Practices
Johannes Hoppe
 
PDF
2012-10-16 - WebTechCon 2012: HTML5 & WebGL
Johannes Hoppe
 
PDF
2012-10-12 - NoSQL in .NET - mit Redis und Mongodb
Johannes Hoppe
 
PDF
2012-09-18 - HTML5 & WebGL
Johannes Hoppe
 
PDF
2012-09-17 - WDC12: Node.js & MongoDB
Johannes Hoppe
 
PDF
2012-05-14 NoSQL in .NET - mit Redis und MongoDB
Johannes Hoppe
 
PDF
2012-05-10 - UG Karlsruhe: NoSQL in .NET - mit Redis und MongoDB
Johannes Hoppe
 
PDF
2012-04-12 - AOP .NET UserGroup Niederrhein
Johannes Hoppe
 
PDF
2012-03-20 - Getting started with Node.js and MongoDB on MS Azure
Johannes Hoppe
 
PDF
2012-01-31 NoSQL in .NET
Johannes Hoppe
 
Einführung in Angular 2
Johannes Hoppe
 
MDC kompakt 2014: Hybride Apps mit Cordova, AngularJS und Ionic
Johannes Hoppe
 
2015 02-09 - NoSQL Vorlesung Mosbach
Johannes Hoppe
 
2012-06-25 - MapReduce auf Azure
Johannes Hoppe
 
2013-06-25 - HTML5 & JavaScript Security
Johannes Hoppe
 
2013-06-24 - Software Craftsmanship with JavaScript
Johannes Hoppe
 
2013-06-15 - Software Craftsmanship mit JavaScript
Johannes Hoppe
 
2013 05-03 - HTML5 & JavaScript Security
Johannes Hoppe
 
2013-03-23 - NoSQL Spartakiade
Johannes Hoppe
 
2013 02-26 - Software Tests with Mongo db
Johannes Hoppe
 
2013-02-21 - .NET UG Rhein-Neckar: JavaScript Best Practices
Johannes Hoppe
 
2012-10-16 - WebTechCon 2012: HTML5 & WebGL
Johannes Hoppe
 
2012-10-12 - NoSQL in .NET - mit Redis und Mongodb
Johannes Hoppe
 
2012-09-18 - HTML5 & WebGL
Johannes Hoppe
 
2012-09-17 - WDC12: Node.js & MongoDB
Johannes Hoppe
 
2012-05-14 NoSQL in .NET - mit Redis und MongoDB
Johannes Hoppe
 
2012-05-10 - UG Karlsruhe: NoSQL in .NET - mit Redis und MongoDB
Johannes Hoppe
 
2012-04-12 - AOP .NET UserGroup Niederrhein
Johannes Hoppe
 
2012-03-20 - Getting started with Node.js and MongoDB on MS Azure
Johannes Hoppe
 
2012-01-31 NoSQL in .NET
Johannes Hoppe
 

Recently uploaded (20)

PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
July Patch Tuesday
Ivanti
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 

DMDW Lesson 05 + 06 + 07 - Data Mining Applied

  • 1. STUDIERENUND DURCHSTARTEN.Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 01.04.201108.04.201115.04.2011
  • 2. Data Mining AppliedAuthor I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 01.04.201108.04.201115.04.2011
  • 5. Applicationsof Data MiningApplications of Data MiningDatabase Marketing Time-series prediction, detecting "trends" Detection (of whatever is detectable)Probability Estimation Information compression Sensitivity Analysis 5
  • 6. Applicationsof Data MiningDatabase Marketing(1/2)Response modelingModel for the response of specific customers. Systematic selection of (old and potential) customers. Advertisements and promotion based on these results. ( CRM)Visualization: "Lift chart" shows how successful the selection should be. (later topic: DM validation)6
  • 7. Lift Chart Example“For contacting 10% of customers, using no model we should get 10% of responders and using the given model we should get 30% of responders.”7
  • 8. Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersQuestion: Which customer might buy which other product?Uses historical purchase data Uses credit card information, lifestyle data, demographic data, etc. Other possible information: Did the customer query special information? How customer heard of the company? 8
  • 9. Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersResults for direct marketing, mailing lists, direct advertising (Amazon) Amazon: "Customers who bought this item also bought" and "personalized recommendations" 9
  • 10. Applicationsof Data MiningTime-series predictionTime series: Stock prices, market shares, … Extrapolation of future values Detection of newly arising trends like customer movements to other productsOwn experience: German print magazines 10
  • 11. Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Fraud detection: Identifying patterns/criteria to detect credit card fraud Estimating creditworthiness ( German Schufa) Prediction of mail orders that will not be paid 11
  • 12. Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Intrusion detection (in computer networks) Find patterns that indicate when an attackis made on an network e.g. clustering: small clusters are of high interest,they point to unusual cases.Definition of Classes may be useful:e.g. harmless, possible harmful,harmful, immediately close LAN 12
  • 13. Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Typical difficultiesNeeds knowledgeDM costs Cost of missing a fraud Cost of false positives(e.g. falsely accusing someone of fraud, company image problems)13
  • 14. Applicationsof Data MiningProbability EstimationApproximate the likelihood of an event given an observation e.g. for classify a potential customer into an A,B,C range before any business14
  • 15. Applicationsof Data MiningInformation CompressionCan be viewed as a special type of estimation problem. For a given set of data, estimate the key components that be can be used to construct the data. 15
  • 16. Applicationsof Data MiningSensitivity AnalysisUnderstand how changes in one variable affect others. Identify sensitivity of one variable on another(find out if dependencies exist). 16
  • 18. Data Mining AlgorithmsData Mining AlgorithmsDifferent algorithms, different usesCombinedThe algorithm depends on what you want to doNot every algorithm is suited for what you want to do18
  • 19. Data Mining AlgorithmsAlgorithms in SSAS: GroupsClassification algorithmsRegression algorithmsAssociation algorithmsSegmentation algorithmsSequence analysis algorithmsPlug-In algorithms19
  • 20. Data Mining AlgorithmsClassification algorithmsPredict discrete attributesBased on experience valuesAlgorithms in SSAS:Naive BayesDecision TreesNeural Networks20
  • 21. Data Mining AlgorithmsRegression algorithmsPredict continuous attributesThe same as classification algorithmsAlgorithms in SSASLinear Regression (Line)Logistic Regression (Curve)MS Time Series21
  • 22. Data Mining AlgorithmsAssociation algorithmsPredict likely combinationsFind elements that occur in combinationAlgorithms in SSAS:MS Associtation Algorithm (Apriori)22
  • 23. Data Mining AlgorithmsSegmentation algorithmsAlso called „Clustering algorithms“Groups data with similar propertiesAlgorithms in SSAS:MS Clustering Algorithms (e.g. K-Means)23
  • 24. Data Mining AlgorithmsSequence analysis algorithms…are clustering algorithmsConsider the sorting; the sequence of values while clusteringDoes not group by similar propertiesGroups by similar sequencesAlgorithms in SSAS: MS Sequence Clustering24
  • 25. Data Mining AlgorithmsPlug-In algorithms.NET Wrapper for COM objectsUse ANY algorithmProvided as an assembly(possible workshop to create one)25
  • 27. Repetition - Datatypes, ContentypesApplying anAlgorithmDatatypesContenttypes27
  • 28. Repetition - Datatypes, ContentypesDatatypesDefinethestructure of thevaluesAvailabledatatypes:TextLongBooleanDoubleDate28
  • 29. Repetition - Datatypes, ContentypesContenttypesDefinethebehaviour of valuesDiscreteContinuousDiscretizedKeyKey SequenceKey TimeOrderedCyclical29
  • 30. Repetition - Datatypes, ContentypesContenttype: DiscreteFixed set of valuesExample:Commute Distance: 1-2, 2-5, 5-10Region: Pacific, Northern America, EuropeName: … … …Boolean values are always discreteText is most likely discrete30
  • 31. Repetition - Datatypes, ContentypesContenttype: ContinuousUnlimited set of valuesInfinite items possibleExampleIncomeAgeDifference between Continuous and Discrete is the most important one31
  • 32. Repetition - Datatypes, ContentypesContenttype: DiscretizedContinuousvaluesconvertedintodiscretevaluesExamples:Income to Categories:A, B, C, …Age to groups:0-20,21-30, 31-40, …32
  • 33. Repetition - Datatypes, ContentypesContenttype: KeyKeyUniquely identifies a rowKey Sequence (sequence clustering models)Series of eventsSortedKey Time (time series models)Identify values on a time scale33
  • 34. Repetition - Datatypes, ContentypesContenttype: OrderedDiscretevaluesthathave a sorting orderNodistancesvisibleNorelationsvisible„One Star“ to „Five Stars“34
  • 35. Repetition - Datatypes, ContentypesContenttype: CyclicalDiscretevaluesthathave a cyclicalsorting orderExample:Weekdays: Monday, Tuesday, … Sunday, Monday, … 1,2,3, …,7, 1, …Months Jan, Feb, Mar, … , Dec, Jan, … 1, 2, 3, …, 12, 1, …35
  • 37. 04Data Mining Algorithms - Decision Trees37
  • 38. Applied Data Mining - Decision Trees38
  • 39. Applied Data Mining - Decision TreesIn GeneralAlso known as: Classification TreesGoal: Sequentially partition DataCan detect non-linear relationshipsMachine Learning TechniqueSeparate into Training and Testing setTraining set is created to create model based on certain criteriaTest set is used to verify the model39
  • 40. Applied Data Mining - Decision TreesTree for response of a mailing actionIncome > $30 000: 3,6 %Male 3,2%(Total: 4.677)Income < $30 000: 2,3 %2,6 % respose rate(Total: 10.000 persons)Age > 40: 3,8%Female 2,1%(Total: 5,323)Age < 40: 3,2 %40
  • 41. Applied Data Mining - Decision TreesUsingtheTrainedTreeExample: the management decides to mail only to groups with response rate >3.5%. TrainedTreeMales: $30 000Response Rate: > 3,5 %Female: 40+41
  • 42. Applied Data Mining - Decision TreesProsVery flexible, white box ModelKiss – Keep it simple, stupid!Little preparation and resources neededConsCan be tuned until deathLong time to buildRequires wisely selected training data!False training yields false resultsBig tree might require disk swapping(Computation might be difficult if it does not fit into main memory.) 42
  • 44. Project: “DMDW Mining Test”(explanation of one note)44
  • 45. Project: “DMDW Mining Test”(shows connections, more useful if there are more predictable values)
  • 46. Project: “DMDW Mining Test”(Generic Content Tree Viewer  DMX (Data Mining Extensions))
  • 47. ReferencesReferences for Decisions TreesOlivia Parr Rud et. al, Data Mining Cookbook - Modeling Data for Marketing, Risk, and Customer Relationship Management, Wiley, 2001David A. Grossman, Ophir Frieder: Introductionto Data Mining, Illinois Institute of Technology 2005Andrew W. Moore: DecisionTrees, Carnegie Mellon University, http://www.autonlab.org/tutorials/dtree16.pdfNongYe (ed.): The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003Sushimita Mitra, TinkuAcharya, Data Mining - Multimedia, Soft Computing andBioinformatics, Wiley, 2003http://en.wikipedia.org/wiki/Classification_tree47
  • 48. 05Data Mining Algorithms - Clustering48
  • 49. Data Mining Algorithms - ClusteringX1249
  • 50. Data Mining Algorithms - ClusteringClusteringSegmentation AlgorithmFind homogenous groups within setFind similar variables for different casesIdentify new relationships that were unclear before(heuristics)e.g. „Person who rides a bike to work doesn‘t live far from his workplace“ (this is not obvious)50
  • 52. 52HomogeneousSubsetsIndependent VariablesDescription of class1. Clustering2. ClassificationclassifyidentifyX12
  • 53. Clustering1. ClusteringReducesdatatoclasses of equaltypesBecomefriedswiththedataIterative AlgorithmClusteringValidateClassifyApplyhttp://msdn.microsoft.com/en-us/library/ms174879.aspx53
  • 54. Data Mining Algorithms - Clustering2. ClassificationCreate a Description of a groupGive it a „name“Also: Characterization54
  • 55. ProcessStart with random valuesReuse will create different sets and different groupsDifferent clustering technique / algorithm will create different groupReuse on same dataset, reseedExpert evaluate found classes and plausibility Good classes used for predictionsGood?1. ClusteringEvaluate, Check2. ClassifyApply(Predict)55
  • 56. ClusteringMS Clustering AlgorithmCombination of two algorithmsK-Means – Hard! Datapoint can be in only one clusterExpectation Maximization – SoftDatapoint has different combinationsDatapoint belongs to different clustersProbability is calculated56Source: http://msdn.microsoft.com/en-us/library/cc280445.aspx
  • 57. Clustering57ProsNo predictable variable to chooseTrains itself without much effortEasy to configure„Cons“Interpretation is everythingGood eye neededExpert has to check for plausibility
  • 58. Project: “DMDW Mining Test”(strongest relations only, amount of matching cases for Region Europe)
  • 59. Project: “DMDW Mining Test”(good to know: continuous attributes are shown by there arithmetic average)
  • 60. Project: “DMDW Mining Test”(comparing two clusters)
  • 61. THANK YOUFOR YOUR ATTENTION61