DATA Mining
ITC GROUP PRESENTATION
1
Group Members:
 Muhammad Seerat Nawaz
 Muhammad Saad Hussain
 Tayyab Hussain
 Syed Daud Wasti
 Mirza Waqar Baig
2
INTRODUCTION
Data Mining: What is Data Mining?
How data mining works?
3
Data Mining: What is Data Mining?
 Data Mining means to mine data to extract useful
information from it.
 Data mining is a process used by companies to turn raw
data into useful information.
 (Data are any facts, numbers, or text that can be
processed by a computer or any information that can
be converted into knowledge about future trends.)
 Information that can be used to increase company’s
earnings, cuts costs, or both.
4
 Data mining tools can answer business question that took
too much time to resolve. Most companies already collect
and refine massive quantities of data.
 Data mining techniques can be implemented rapidly on
existing software and hardware platforms to enhance the
value of existing information resources, and can be
integrated with new products and systems as they are
brought on-line.
 When implemented on high performance client or parallel
processing computers, data mining tools can analyze massive
databases to deliver answers to questions such as:
"Which clients are most likely to respond to my next
mailing, and why?“
5
How data mining works?
 Collecting data (from Users)
 Storing data (Database: Data Warehousing)
 Business rules and analytics (Organizing data)
 Visualization (Data Mining Software)
 Then, anything involving data you want to use to
improve profit.
6
Data Mining Techniques
Association, Classification, Clustering, Prediction, Sequential
Patterns and Decision trees
7
Data Mining Techniques
Association
Association is one of the
best known data mining
technique.
It is used in market basket
analysis to identify a set of
products that customers
frequently purchase
together.
E.g.: Retailers might find
out that customers always
buy crisps when they buy
beers, and therefore they
can put beers and crisps
next to each other to save
time for customer and
increase sales.
Classification
Classification is a classic
data mining technique
based on machine
learning.
Basically classification is
used to classify each item
in a set of data into one of
predefined set of classes
or groups.
Clustering
Clustering is a data mining
technique that makes
meaningful or useful
group of objects which
have similar characteristics
using automatic
technique.
The clustering technique
defines the classes and
puts objects in each class,
while in the classification
techniques, objects are
assigned into predefined
classes.
8
Data Mining Techniques
Prediction
The prediction, as it name
implied, is one of a data
mining techniques that
discovers relationship
between dependent and
independent variables.
Sequential
Patterns
In sales, with historical
transaction data,
businesses can identify a
set of items that
customers buy together a
different times in a year.
Then businesses can use
this information to
recommend customers
buy it with better deals
based on their purchasing
frequency in the past.
Decision trees
Decision tree is one of the
most used data mining
techniques because its
model is easy to
understand for users. In
decision tree technique,
the root of the decision
tree is a simple question
or condition that has
multiple answers.
9
Data Mining Process
Data mining is a promising and relatively new technology. Data mining is a
process of discovering hidden valuable knowledge by analyzing large
amounts of data, which is stored in databases or data warehouse, using
various data mining techniques such as machine learning, artificial
intelligence and statistical.
10
The Cross-Industry Standard Process
CRISP-DM consists of six
phases intended as a
cyclical process as the
following figure:
11
Phase 1 : Business understanding
 It is required to understand business objectives clearly
and find out what are the business’s needs.
 A good data mining plan has to be established to
achieve both business and data mining goals.
 The plan should be as detailed as possible.
12
Phase 2 : Data understanding
 The data understanding phase starts with initial data
collection, which we collect from available data sources,
to help us get familiar with the data.
 Some important activities must be performed including
data load and data integration in order to make the
data collection successfully.
 If this phase does not understand data correctly or data
does not fulfil business objectives then data goes back
to phase 1.
13
Phase 3 : Data preparation
 The data preparation typically consumes about 90% of
the time of the project.
 The outcome of the data preparation phase is the final
data set.
 Once available data sources are identified, they need to
be selected, cleaned, constructed and formatted into
the desired form.
14
Phase 4 : Modeling
 Modeling techniques have to be selected to be used for
the prepared dataset.
 Models need to be assessed carefully involving
stakeholders to make sure that created models are met
business requirements.
15
Phase 5 : Evaluation
 In the evaluation phase, the model results must be
evaluated in the context of business objectives in the
first phase.
 In this phase, new business requirements may be raised
due to the new patterns that have been discovered in
the model results or from other factors.
16
Phase 6 : Deployment
 The knowledge or information, which we gain through
data mining process, needs to be presented in such a
way that stakeholders can use it when they want it.
 The plans for deployment, maintenance and monitoring
have to be created for implementation and also future
supports.
 It is the final phase of this process.
17
Data Mining: Applications
Data mining is a process that analyzes a large amount of
data to find new and hidden information that improves
business efficiency.
18
Data Mining Applications in
Sales/Marketing
 Data mining enables businesses to understand the
hidden patterns inside historical purchasing transaction
data, thus helping in planning and launching new
marketing campaigns in prompt and cost effective way.
 Data mining is used for market basket analysis to
provide information on what product combinations
were purchased together, when they were bought and
in what sequence.
 Retail companies’ uses data mining to identify
customer’s behavior buying patterns.
19
Data Mining Applications in
Health Care and Insurance
 Data mining is applied in claims analysis such as
identifying which medical procedures are claimed
together.
 Data mining enables to forecasts which customers will
potentially purchase new policies.
 Data mining allows insurance companies to detect risky
customers’ behavior patterns.
 Data mining helps detect fraudulent behavior.
20
Data Mining Applications in
Banking / Finance
 Several data mining techniques e.g., distributed data
mining have been researched, modeled and developed
to help credit card fraud detection.
 To help bank to retain credit card customers, data
mining is applied. By analyzing the past data, data
mining can help banks predict customers that likely to
change their credit card affiliation so they can plan and
launch different special offers to retain those customers.
 Credit card spending by customer groups can be
identified by using data mining.
21
Data Mining Applications in
Medicine
 Data mining enables to characterize patient activities to
see incoming office visits.
 Data mining helps identify the patterns of successful
medical therapies for different illnesses.
22
Advantages and
Disadvantages of Data Mining
• Advantages of Data Mining
• Disadvantages of Data Mining
23
Advantages of Data Mining
Marketing / Retail
 Data mining helps marketing
companies build models
based on historical data to
predict who will respond to
the new marketing campaigns
such as direct mail, online
marketing campaign…etc.
 Through the results,
marketers will have
appropriate approach to sell
profitable products to
targeted customers.
Finance / Banking
 Data mining gives financial
institutions information about
loan information and credit
reporting.
 By building a model from
historical customer’s data, the
bank and financial institution
can determine good and bad
loans.
 In addition, data mining helps
banks detect fraudulent credit
card transactions to protect
credit card’s owner.
24
Advantages of Data Mining
Manufacturing
 In manufacturing, data
discovery is used to improve
product safety, usability and
comfort.
 By applying data mining in
operational engineering data,
manufacturers can detect
faulty equipment’s and
determine optimal control
parameters.
Governments
 Data mining helps
government agency by
digging and analyzing
records of the financial
transaction to build patterns
that can detect money
laundering or criminal
activities.
25
Disadvantages of Data Mining
Privacy Issues
 When internet is booming with
social networks, e-commerce,
forums, blogs.
 Because of privacy issues,
people are afraid of their
personal information is
collected and used in unethical
way that potentially causing
them a lot of troubles.
 Businesses collect information
about their customers in many
ways for understanding their
purchasing behaviors trends.
Security issues
 Security is a big issue.
 Businesses own information
about their employees and
customers including social
security number, birthday,
payroll and etc.
 However how properly this
information is taken care is still
in questions.
 There have been a lot of cases
that hackers accessed and stole
big data of customers from big
corporation
26
Disadvantages of Data Mining
Misuse of information:
 Information is collected
through data mining intended
for the ethical purposes can
be misused.
 This information may be
exploited by unethical people
or businesses to take benefits
of vulnerable people or
discriminate against a group
of people.
Governments
 Sometime the governments
agencies gets your personal
information illegally.
 Using Programs Like:
 Xkeyscore
 Prism
 Then they can use that
information against you.
“Data mining is everywhere, but its story starts many
years before Edward Snowden.”
27
Conclusion
The use of data mining in enrollment management is a fairly new
development. Current data mining is done primarily on simple
numeric and categorical data. In the future, data mining will include
more complex data types.
Data mining brings a lot of benefits to businesses, society,
governments as well as the individual. However, privacy, security, and
misuse of information are the big problems if they are not addressed
and resolved properly.
28
Data Mining: What is Data Mining?

Data Mining: What is Data Mining?

  • 1.
    DATA Mining ITC GROUPPRESENTATION 1
  • 2.
    Group Members:  MuhammadSeerat Nawaz  Muhammad Saad Hussain  Tayyab Hussain  Syed Daud Wasti  Mirza Waqar Baig 2
  • 3.
    INTRODUCTION Data Mining: Whatis Data Mining? How data mining works? 3
  • 4.
    Data Mining: Whatis Data Mining?  Data Mining means to mine data to extract useful information from it.  Data mining is a process used by companies to turn raw data into useful information.  (Data are any facts, numbers, or text that can be processed by a computer or any information that can be converted into knowledge about future trends.)  Information that can be used to increase company’s earnings, cuts costs, or both. 4
  • 5.
     Data miningtools can answer business question that took too much time to resolve. Most companies already collect and refine massive quantities of data.  Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line.  When implemented on high performance client or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as: "Which clients are most likely to respond to my next mailing, and why?“ 5
  • 6.
    How data miningworks?  Collecting data (from Users)  Storing data (Database: Data Warehousing)  Business rules and analytics (Organizing data)  Visualization (Data Mining Software)  Then, anything involving data you want to use to improve profit. 6
  • 7.
    Data Mining Techniques Association,Classification, Clustering, Prediction, Sequential Patterns and Decision trees 7
  • 8.
    Data Mining Techniques Association Associationis one of the best known data mining technique. It is used in market basket analysis to identify a set of products that customers frequently purchase together. E.g.: Retailers might find out that customers always buy crisps when they buy beers, and therefore they can put beers and crisps next to each other to save time for customer and increase sales. Classification Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Clustering Clustering is a data mining technique that makes meaningful or useful group of objects which have similar characteristics using automatic technique. The clustering technique defines the classes and puts objects in each class, while in the classification techniques, objects are assigned into predefined classes. 8
  • 9.
    Data Mining Techniques Prediction Theprediction, as it name implied, is one of a data mining techniques that discovers relationship between dependent and independent variables. Sequential Patterns In sales, with historical transaction data, businesses can identify a set of items that customers buy together a different times in a year. Then businesses can use this information to recommend customers buy it with better deals based on their purchasing frequency in the past. Decision trees Decision tree is one of the most used data mining techniques because its model is easy to understand for users. In decision tree technique, the root of the decision tree is a simple question or condition that has multiple answers. 9
  • 10.
    Data Mining Process Datamining is a promising and relatively new technology. Data mining is a process of discovering hidden valuable knowledge by analyzing large amounts of data, which is stored in databases or data warehouse, using various data mining techniques such as machine learning, artificial intelligence and statistical. 10
  • 11.
    The Cross-Industry StandardProcess CRISP-DM consists of six phases intended as a cyclical process as the following figure: 11
  • 12.
    Phase 1 :Business understanding  It is required to understand business objectives clearly and find out what are the business’s needs.  A good data mining plan has to be established to achieve both business and data mining goals.  The plan should be as detailed as possible. 12
  • 13.
    Phase 2 :Data understanding  The data understanding phase starts with initial data collection, which we collect from available data sources, to help us get familiar with the data.  Some important activities must be performed including data load and data integration in order to make the data collection successfully.  If this phase does not understand data correctly or data does not fulfil business objectives then data goes back to phase 1. 13
  • 14.
    Phase 3 :Data preparation  The data preparation typically consumes about 90% of the time of the project.  The outcome of the data preparation phase is the final data set.  Once available data sources are identified, they need to be selected, cleaned, constructed and formatted into the desired form. 14
  • 15.
    Phase 4 :Modeling  Modeling techniques have to be selected to be used for the prepared dataset.  Models need to be assessed carefully involving stakeholders to make sure that created models are met business requirements. 15
  • 16.
    Phase 5 :Evaluation  In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase.  In this phase, new business requirements may be raised due to the new patterns that have been discovered in the model results or from other factors. 16
  • 17.
    Phase 6 :Deployment  The knowledge or information, which we gain through data mining process, needs to be presented in such a way that stakeholders can use it when they want it.  The plans for deployment, maintenance and monitoring have to be created for implementation and also future supports.  It is the final phase of this process. 17
  • 18.
    Data Mining: Applications Datamining is a process that analyzes a large amount of data to find new and hidden information that improves business efficiency. 18
  • 19.
    Data Mining Applicationsin Sales/Marketing  Data mining enables businesses to understand the hidden patterns inside historical purchasing transaction data, thus helping in planning and launching new marketing campaigns in prompt and cost effective way.  Data mining is used for market basket analysis to provide information on what product combinations were purchased together, when they were bought and in what sequence.  Retail companies’ uses data mining to identify customer’s behavior buying patterns. 19
  • 20.
    Data Mining Applicationsin Health Care and Insurance  Data mining is applied in claims analysis such as identifying which medical procedures are claimed together.  Data mining enables to forecasts which customers will potentially purchase new policies.  Data mining allows insurance companies to detect risky customers’ behavior patterns.  Data mining helps detect fraudulent behavior. 20
  • 21.
    Data Mining Applicationsin Banking / Finance  Several data mining techniques e.g., distributed data mining have been researched, modeled and developed to help credit card fraud detection.  To help bank to retain credit card customers, data mining is applied. By analyzing the past data, data mining can help banks predict customers that likely to change their credit card affiliation so they can plan and launch different special offers to retain those customers.  Credit card spending by customer groups can be identified by using data mining. 21
  • 22.
    Data Mining Applicationsin Medicine  Data mining enables to characterize patient activities to see incoming office visits.  Data mining helps identify the patterns of successful medical therapies for different illnesses. 22
  • 23.
    Advantages and Disadvantages ofData Mining • Advantages of Data Mining • Disadvantages of Data Mining 23
  • 24.
    Advantages of DataMining Marketing / Retail  Data mining helps marketing companies build models based on historical data to predict who will respond to the new marketing campaigns such as direct mail, online marketing campaign…etc.  Through the results, marketers will have appropriate approach to sell profitable products to targeted customers. Finance / Banking  Data mining gives financial institutions information about loan information and credit reporting.  By building a model from historical customer’s data, the bank and financial institution can determine good and bad loans.  In addition, data mining helps banks detect fraudulent credit card transactions to protect credit card’s owner. 24
  • 25.
    Advantages of DataMining Manufacturing  In manufacturing, data discovery is used to improve product safety, usability and comfort.  By applying data mining in operational engineering data, manufacturers can detect faulty equipment’s and determine optimal control parameters. Governments  Data mining helps government agency by digging and analyzing records of the financial transaction to build patterns that can detect money laundering or criminal activities. 25
  • 26.
    Disadvantages of DataMining Privacy Issues  When internet is booming with social networks, e-commerce, forums, blogs.  Because of privacy issues, people are afraid of their personal information is collected and used in unethical way that potentially causing them a lot of troubles.  Businesses collect information about their customers in many ways for understanding their purchasing behaviors trends. Security issues  Security is a big issue.  Businesses own information about their employees and customers including social security number, birthday, payroll and etc.  However how properly this information is taken care is still in questions.  There have been a lot of cases that hackers accessed and stole big data of customers from big corporation 26
  • 27.
    Disadvantages of DataMining Misuse of information:  Information is collected through data mining intended for the ethical purposes can be misused.  This information may be exploited by unethical people or businesses to take benefits of vulnerable people or discriminate against a group of people. Governments  Sometime the governments agencies gets your personal information illegally.  Using Programs Like:  Xkeyscore  Prism  Then they can use that information against you. “Data mining is everywhere, but its story starts many years before Edward Snowden.” 27
  • 28.
    Conclusion The use ofdata mining in enrollment management is a fairly new development. Current data mining is done primarily on simple numeric and categorical data. In the future, data mining will include more complex data types. Data mining brings a lot of benefits to businesses, society, governments as well as the individual. However, privacy, security, and misuse of information are the big problems if they are not addressed and resolved properly. 28