SlideShare a Scribd company logo
6
Most read
7
Most read
8
Most read
Introduction to Data Mining
Mahmoud Rafeek Alfarra
http://mfarra.cst.ps
University College of Science & Technology- Khan yonis
Development of computer systems
2016
Chapter 1 – Lecture 3
Outline
 Definition of Data Mining
 Data Mining as an Interdisciplinary field
 Process of Data Mining
 Data Mining Tasks
 Challenges of Data Mining
 Data mining application examples
 Introduction to RapidMiner
Data Mining Tasks
 Data mining tasks are the kind of data
patterns that can be mined.
 Data Mining functionalities are used to
specify the kind of patterns to be found in the
data mining tasks.
 In general data mining tasks can be classified into
two categories:
Descriptive mining tasks characterize the general
properties of the data.
Predictive mining tasks perform inferences on the current
data in order to make predictions.
Data Mining Tasks
 Most famous data mining tasks:
 Classification [Predictive]
Prediction [Predictive]
Association Rules [Descriptive]
Clustering [Descriptive]
Outlier Analysis [Descriptive]
Data Mining Tasks
Classification
 Classification is used for predictive mining tasks.
 The input data for predictive modeling consists of
two types of variables:
Explanatory variables, which define the essential properties of
the data.
 Target variables , whose values are to be predicted.
 Classification is used to predicate the value of
discrete target variable.
Classification
Prediction
 Similar to classification, except we are trying to predict
the value of a variable (e.g. amount of purchase),
rather than a class (e.g. purchaser or non-purchaser).
Association
 Association Rules aims to find out the relationship
among valuables in database, resulting in deferent types
of rules.
 Seek to produce a set of rules describing the set of
features that are strongly related to each others.
Association
Gender Age Smoker LAD% RCA%
F 52 Y 85 100
M 62 N 80 0
M 75 Y 70 80
M 73 Y 40 99
M 66 N 50 45
… … … … …
 LAD%- The percentage of heat disease caused by left anterior descending coronary artery.
 RCA%- The percentage of heat disease caused by right coronary artery.
Original data from a research on heart disease
Association
Medical Association Rules
NO. Rule
1 Gender=M∩Age≥70∩Smoker=YRCA%≥50(40%,100%)
2 Gender=F∩Age<70∩Smoker=YLAD%≥70(20%,100%)
 Rule 1 indicates:40% of the cases are male, over 70 years old and have the habit of
smoking, the possibility of RCA%≥50% is 100%
 Rule 2 indicates:20% of the cases are female, under 70 years old and have the habit
of smoking, the possibility of LAD%≥70% is 100%
Clustering
 Finds groups of data pointes (clusters) so that data
points that belong to one cluster are more similar to
each other than to data points belonging to different
cluster.
Clustering
Document Clustering:
 Goal: To find groups of documents that are similar to each
other based on the important terms appearing in them.
 Approach: To identify frequently occurring terms in each
document. Form a similarity measure based on the frequencies
of different terms. Use it to cluster.
 Gain: Information Retrieval can utilize the clusters to relate a
new document or search term to clustered documents.
Outlier Analysis
 Discovers data points that are significantly different
than the rest of the data. Such points are known as
anomalies or outliers.
Outline
Definition of Data Mining
Data Mining as an Interdisciplinary field
Process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMiner
Challenges of Data Mining
Scalability: Scalable techniques are needed
to handle the massive scale of data.
Dimensionality: Many applications may
involves a large number of dimensions (e.g.
features or attributes of data)
Challenges of Data Mining
Heterogeneous and Complex Data: In recent years
complicated data types such as graph-based, text-free
and structured data types are introduced. Techniques
developed for data mining must be able to handle the
heterogeneity of the data.
Challenges of Data Mining
Data Quality: Many data sets are imperfect due to
present of missing values and noise un the data. To
handle the imperfection, robust data mining algorithms
must be developed.
Challenges of Data Mining
Data Distribution: As the volume of data increases , it
is no longer possible or safe to keep all the data in the
same place. As a result, the need for distributed data
mining techniques has increased over the years.
Challenges of Data Mining
Privacy Preservation: While privacy intends to prevent
the disclosure of information, data mining attempts to
revel interesting knowledge about data. As a result,
there is growing interest in developing privacy-
preserving data mining algorithms.
Outline
Definition of Data Mining
Data Mining as an Interdisciplinary field
Process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMine
Data mining application
Science
astronomy, bioinformatics, drug discovery, …
Business
advertising, CRM (Customer Relationship management),
investments, manufacturing, sports/entertainment, telecom, e-
Commerce, targeted marketing, health care, …
Web
search engines, web mining,…
Government
law enforcement, profiling tax cheaters,

More Related Content

What's hot (20)

PPTX
Data mining tasks
Khwaja Aamer
 
PPTX
Dm from databases perspective u 1
sakthyvel3
 
PPT
3. mining frequent patterns
Azad public school
 
PPTX
multi dimensional data model
moni sindhu
 
PPTX
Data mining: Classification and prediction
DataminingTools Inc
 
PPTX
Kdd process
Rajesh Chandra
 
PPTX
Data Reduction
Rajan Shah
 
PPTX
Major issues in data mining
Slideshare
 
PPTX
Distributed dbms architectures
Pooja Dixit
 
DOC
Data mining notes
AVC College of Engineering
 
PPT
11. Storage and File Structure in DBMS
koolkampus
 
PPTX
Data reduction
kalavathisugan
 
PPT
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
PPTX
Introduction to HDFS
Bhavesh Padharia
 
PPTX
OLAP operations
kunj desai
 
PPTX
Introduction to Data Mining
DataminingTools Inc
 
PPT
3.2 partitioning methods
Krish_ver2
 
PPTX
Data mining
Akannsha Totewar
 
PPTX
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
PPT
3.7 outlier analysis
Krish_ver2
 
Data mining tasks
Khwaja Aamer
 
Dm from databases perspective u 1
sakthyvel3
 
3. mining frequent patterns
Azad public school
 
multi dimensional data model
moni sindhu
 
Data mining: Classification and prediction
DataminingTools Inc
 
Kdd process
Rajesh Chandra
 
Data Reduction
Rajan Shah
 
Major issues in data mining
Slideshare
 
Distributed dbms architectures
Pooja Dixit
 
Data mining notes
AVC College of Engineering
 
11. Storage and File Structure in DBMS
koolkampus
 
Data reduction
kalavathisugan
 
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Introduction to HDFS
Bhavesh Padharia
 
OLAP operations
kunj desai
 
Introduction to Data Mining
DataminingTools Inc
 
3.2 partitioning methods
Krish_ver2
 
Data mining
Akannsha Totewar
 
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
3.7 outlier analysis
Krish_ver2
 

Similar to 3 Data Mining Tasks (20)

ODP
Data mining
Daminda Herath
 
ODP
Data mining
Daminda Herath
 
PDF
Introduction to Data Mining and Knowledge DiscoveryChapter 01
Mahmudur Rahman
 
PDF
Data Mining
SOMASUNDARAM T
 
PDF
data mining
manasa polu
 
PPTX
2 Data-mining process
Mahmoud Alfarra
 
PDF
Data mining chapter for students of university
hossainsafari4
 
PPTX
Data Mining Presentation.pptx
ChingChingErm
 
PPTX
01-data mining-introduction-bayero-u.pptx
DavidClement34
 
PPTX
lec01-IntroductionToDataMining.pptx
AmjadAlDgour
 
PPTX
01 Introduction to Data Mining
Valerii Klymchuk
 
PPTX
Data mining concepts
Basit Rafiq
 
PPT
Data Mining- Unit-I PPT (1).ppt
AravindReddy565690
 
PPTX
3510-6510_Ch4.pptx
Pak Tari
 
PDF
Overview of Data Mining
ijtsrd
 
PPTX
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
 
PPT
Draws ideas from machine learning/AI, pattern recognition, statistics, and da...
indiragecit
 
PDF
Data Warehousing and Suitable for BCA, BSC, MCA
Guru Jhambheswar University of science and technology,Hisar-125033
 
PPTX
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
PDF
G045033841
IJERA Editor
 
Data mining
Daminda Herath
 
Data mining
Daminda Herath
 
Introduction to Data Mining and Knowledge DiscoveryChapter 01
Mahmudur Rahman
 
Data Mining
SOMASUNDARAM T
 
data mining
manasa polu
 
2 Data-mining process
Mahmoud Alfarra
 
Data mining chapter for students of university
hossainsafari4
 
Data Mining Presentation.pptx
ChingChingErm
 
01-data mining-introduction-bayero-u.pptx
DavidClement34
 
lec01-IntroductionToDataMining.pptx
AmjadAlDgour
 
01 Introduction to Data Mining
Valerii Klymchuk
 
Data mining concepts
Basit Rafiq
 
Data Mining- Unit-I PPT (1).ppt
AravindReddy565690
 
3510-6510_Ch4.pptx
Pak Tari
 
Overview of Data Mining
ijtsrd
 
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
 
Draws ideas from machine learning/AI, pattern recognition, statistics, and da...
indiragecit
 
Data Warehousing and Suitable for BCA, BSC, MCA
Guru Jhambheswar University of science and technology,Hisar-125033
 
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
G045033841
IJERA Editor
 
Ad

More from Mahmoud Alfarra (20)

PPT
Computer Programming, Loops using Java - part 2
Mahmoud Alfarra
 
PPT
Computer Programming, Loops using Java
Mahmoud Alfarra
 
PPT
Chapter 10: hashing data structure
Mahmoud Alfarra
 
PPT
Chapter9 graph data structure
Mahmoud Alfarra
 
PPT
Chapter 8: tree data structure
Mahmoud Alfarra
 
PPT
Chapter 7: Queue data structure
Mahmoud Alfarra
 
PPT
Chapter 6: stack data structure
Mahmoud Alfarra
 
PPT
Chapter 5: linked list data structure
Mahmoud Alfarra
 
PPT
Chapter 4: basic search algorithms data structure
Mahmoud Alfarra
 
PPT
Chapter 3: basic sorting algorithms data structure
Mahmoud Alfarra
 
PPT
Chapter 2: array and array list data structure
Mahmoud Alfarra
 
PPT
Chapter1 intro toprincipleofc#_datastructure_b_cs
Mahmoud Alfarra
 
PPT
Chapter 0: introduction to data structure
Mahmoud Alfarra
 
PPTX
3 classification
Mahmoud Alfarra
 
PPT
8 programming-using-java decision-making practices 20102011
Mahmoud Alfarra
 
PPT
7 programming-using-java decision-making220102011
Mahmoud Alfarra
 
PPT
6 programming-using-java decision-making20102011-
Mahmoud Alfarra
 
PPT
5 programming-using-java intro-tooop20102011
Mahmoud Alfarra
 
PPT
4 programming-using-java intro-tojava20102011
Mahmoud Alfarra
 
PPT
3 programming-using-java introduction-to computer
Mahmoud Alfarra
 
Computer Programming, Loops using Java - part 2
Mahmoud Alfarra
 
Computer Programming, Loops using Java
Mahmoud Alfarra
 
Chapter 10: hashing data structure
Mahmoud Alfarra
 
Chapter9 graph data structure
Mahmoud Alfarra
 
Chapter 8: tree data structure
Mahmoud Alfarra
 
Chapter 7: Queue data structure
Mahmoud Alfarra
 
Chapter 6: stack data structure
Mahmoud Alfarra
 
Chapter 5: linked list data structure
Mahmoud Alfarra
 
Chapter 4: basic search algorithms data structure
Mahmoud Alfarra
 
Chapter 3: basic sorting algorithms data structure
Mahmoud Alfarra
 
Chapter 2: array and array list data structure
Mahmoud Alfarra
 
Chapter1 intro toprincipleofc#_datastructure_b_cs
Mahmoud Alfarra
 
Chapter 0: introduction to data structure
Mahmoud Alfarra
 
3 classification
Mahmoud Alfarra
 
8 programming-using-java decision-making practices 20102011
Mahmoud Alfarra
 
7 programming-using-java decision-making220102011
Mahmoud Alfarra
 
6 programming-using-java decision-making20102011-
Mahmoud Alfarra
 
5 programming-using-java intro-tooop20102011
Mahmoud Alfarra
 
4 programming-using-java intro-tojava20102011
Mahmoud Alfarra
 
3 programming-using-java introduction-to computer
Mahmoud Alfarra
 
Ad

Recently uploaded (20)

PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PDF
The Different Types of Non-Experimental Research
Thelma Villaflores
 
PPTX
QUARTER 1 WEEK 2 PLOT, POV AND CONFLICTS
KynaParas
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPTX
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
PDF
Horarios de distribución de agua en julio
pegazohn1978
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PDF
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
PPTX
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
PPTX
GRADE-3-PPT-EVE-2025-ENG-Q1-LESSON-1.pptx
EveOdrapngimapNarido
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
The Different Types of Non-Experimental Research
Thelma Villaflores
 
QUARTER 1 WEEK 2 PLOT, POV AND CONFLICTS
KynaParas
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
Horarios de distribución de agua en julio
pegazohn1978
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
GRADE-3-PPT-EVE-2025-ENG-Q1-LESSON-1.pptx
EveOdrapngimapNarido
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 

3 Data Mining Tasks

  • 1. Introduction to Data Mining Mahmoud Rafeek Alfarra http://mfarra.cst.ps University College of Science & Technology- Khan yonis Development of computer systems 2016 Chapter 1 – Lecture 3
  • 2. Outline  Definition of Data Mining  Data Mining as an Interdisciplinary field  Process of Data Mining  Data Mining Tasks  Challenges of Data Mining  Data mining application examples  Introduction to RapidMiner
  • 3. Data Mining Tasks  Data mining tasks are the kind of data patterns that can be mined.  Data Mining functionalities are used to specify the kind of patterns to be found in the data mining tasks.
  • 4.  In general data mining tasks can be classified into two categories: Descriptive mining tasks characterize the general properties of the data. Predictive mining tasks perform inferences on the current data in order to make predictions. Data Mining Tasks
  • 5.  Most famous data mining tasks:  Classification [Predictive] Prediction [Predictive] Association Rules [Descriptive] Clustering [Descriptive] Outlier Analysis [Descriptive] Data Mining Tasks
  • 6. Classification  Classification is used for predictive mining tasks.  The input data for predictive modeling consists of two types of variables: Explanatory variables, which define the essential properties of the data.  Target variables , whose values are to be predicted.  Classification is used to predicate the value of discrete target variable.
  • 8. Prediction  Similar to classification, except we are trying to predict the value of a variable (e.g. amount of purchase), rather than a class (e.g. purchaser or non-purchaser).
  • 9. Association  Association Rules aims to find out the relationship among valuables in database, resulting in deferent types of rules.  Seek to produce a set of rules describing the set of features that are strongly related to each others.
  • 10. Association Gender Age Smoker LAD% RCA% F 52 Y 85 100 M 62 N 80 0 M 75 Y 70 80 M 73 Y 40 99 M 66 N 50 45 … … … … …  LAD%- The percentage of heat disease caused by left anterior descending coronary artery.  RCA%- The percentage of heat disease caused by right coronary artery. Original data from a research on heart disease
  • 11. Association Medical Association Rules NO. Rule 1 Gender=M∩Age≥70∩Smoker=YRCA%≥50(40%,100%) 2 Gender=F∩Age<70∩Smoker=YLAD%≥70(20%,100%)  Rule 1 indicates:40% of the cases are male, over 70 years old and have the habit of smoking, the possibility of RCA%≥50% is 100%  Rule 2 indicates:20% of the cases are female, under 70 years old and have the habit of smoking, the possibility of LAD%≥70% is 100%
  • 12. Clustering  Finds groups of data pointes (clusters) so that data points that belong to one cluster are more similar to each other than to data points belonging to different cluster.
  • 13. Clustering Document Clustering:  Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.
  • 14. Outlier Analysis  Discovers data points that are significantly different than the rest of the data. Such points are known as anomalies or outliers.
  • 15. Outline Definition of Data Mining Data Mining as an Interdisciplinary field Process of Data Mining Data Mining Tasks Challenges of Data Mining Data mining application examples Introduction to RapidMiner
  • 16. Challenges of Data Mining Scalability: Scalable techniques are needed to handle the massive scale of data. Dimensionality: Many applications may involves a large number of dimensions (e.g. features or attributes of data)
  • 17. Challenges of Data Mining Heterogeneous and Complex Data: In recent years complicated data types such as graph-based, text-free and structured data types are introduced. Techniques developed for data mining must be able to handle the heterogeneity of the data.
  • 18. Challenges of Data Mining Data Quality: Many data sets are imperfect due to present of missing values and noise un the data. To handle the imperfection, robust data mining algorithms must be developed.
  • 19. Challenges of Data Mining Data Distribution: As the volume of data increases , it is no longer possible or safe to keep all the data in the same place. As a result, the need for distributed data mining techniques has increased over the years.
  • 20. Challenges of Data Mining Privacy Preservation: While privacy intends to prevent the disclosure of information, data mining attempts to revel interesting knowledge about data. As a result, there is growing interest in developing privacy- preserving data mining algorithms.
  • 21. Outline Definition of Data Mining Data Mining as an Interdisciplinary field Process of Data Mining Data Mining Tasks Challenges of Data Mining Data mining application examples Introduction to RapidMine
  • 22. Data mining application Science astronomy, bioinformatics, drug discovery, … Business advertising, CRM (Customer Relationship management), investments, manufacturing, sports/entertainment, telecom, e- Commerce, targeted marketing, health care, … Web search engines, web mining,… Government law enforcement, profiling tax cheaters,