SlideShare a Scribd company logo
DM

Myths of DM
Techniques of DM
Myth #1:Data mining provides instant
crystal ball-predictions
Data mining is neither a crystal ball nor a
technology where answers magically appear after
pushing a single button.
It's a multi-step process that includes: defining the
business problem, exploring and conditioning data,
developing the model, and deploying the
knowledge gained.
Typically, companies spend the bulk of their time
preprocessing and conditioning the data to make
sure it is clean, consistent, and combined properly
to deliver business intelligence on which they can
rely. Data mining is all about the data -- successful
data mining requires data that accurately reflects
the business.
Myth #2: Data mining is not yet viable
for business application
Data mining is viable technology and highly prized
for its business results.
The myth tends to be perpetrated by those who
need to explain why they are not yet using the
process and revolves around two related
statements.
Myth #3: Data mining requires a data
warehouse
 It is true that data mining can benefit from warehoused

data that is well organized, relatively clean, and easy to
access.
 This is particularly true if the warehouse has been
constructed with data mining specifically in mind and
with knowledge of the requirements of the data mining
project.
 However, the warehoused data may be less useful for
data mining than the source or operational data. In the
worst case, warehoused data may be completely
useless (for example, if only summary data are stored).
Myth #4: DM is all about algorithms
 People often misunderstood that "All you need for data

mining is good algorithms. The better your algorithms,
the better your data mining; advancing the effectiveness
of data mining means advancing our knowledge of
algorithms.“
 This is often to misunderstand the data mining process.
Data mining is a process consisting of many elements,
such as formulating business goals, mapping business
goals to data mining goals, acquiring, understanding,
and pre-processing the data, evaluating and presenting
the results of analysis and deploying these results to
achieve business benefits.
 This is not to minimize the importance of new or
improved data mining algorithms
Myth #5: DM should be done by

technology expert
 Quite the opposite is true, due to the paramount

importance of business knowledge in data mining.
 When performed without business knowledge, data
mining can produce nonsensical or useless results so
it is essential that data mining be performed by
someone with extensive knowledge of the business
problem.
 Very seldom is this the same person with extensive
knowledge of the data mining technology. It is the
responsibility of data mining tool providers to ensure
that tools are accessible to business users.
Myth #6: Data mining is for large companies
with lots of customer data
 The plain fact is that if a company, large or

small, has data that accurately reflects the
business or its customers, it can build models
against that data that lend insights into
important business challenges. The amount
of customer data a company possesses has
never been the issue.
Fundamental concepts of DM
 Classification

Classification is the operation most commonly
supported by commercial data mining tools.
 It is the process of sub-dividing a data set with
regard to a number of specific outcomes.
 For example, classifying customers into ‘high’
and ‘low’ categories with regard to credit risk.
 The category or ‘class’ into which each
customer is placed is the ‘outcome’ of the
classification.

Prediction
 Prediction gives the future data states based

on past and current data. Prediction can be
viewed as a type of classification. Ex:
Predicting floods
 Techniques for Classification and prediction decision trees, neural networks, nearest
neighbour algorithms
Understanding v Prediction
 Sophisticated classification techniques enable us to

discover new patterns in large and complex data
sets.
 Classification is a powerful aid to understanding a
particular problem. In some cases, improved
understanding is sufficient. It may suggest new
initiatives and provide information that improves
future decision making.
 Often the reason for developing an accurate
classification model is to improve our capability for
prediction.
Training
 A classification model is said to be ‘trained’

on historical data, for which the outcome is
known for each record.
 But beware over fitting: for example100 per
cent of customers called Smith who live at 28
Arcadia Street responded to the offer.
 One would then use a separate test dataset
of historical data to validate the model.
 The model could then be applied to a new,
unclassified data set in order to predict the
outcome for each record.
Clustering
 It is used to find groupings of similar records

in a data set without any preconditions as to
what that similarity may involve.
 Clustering is used to identify interesting
groups in a customer base that may not have
been recognised before. Often undertaken as
an exploratory exercise before doing further
data mining using a classification technique.
 Techniques for Clustering - cluster analysis,
neural networks
Association analysis
 Association analysis looks for links between

records in a data set.
 Sometimes referred to as ‘market basket
analysis’, its most common aim is to discover
which items are generally purchased at the
same time.
Example of Association Analysis
 Consider the following beer and nappy

example:
 500,000 transactions
 20,000 transactions contain nappies (4%)
 30,000 transactions contain beer (6%)
 10,000 transactions contain both nappies and
beer (2%)
Sequential analysis
 Sequential analysis looks for temporal links

between purchases, rather than relationships
between items in a single transaction.
Support (or prevalence)
 Measures how often items occur together, as

a percentage of the total transactions. In this
example, beer and nappies occur together
2% of the time (10,000/500,000).
Confidence (or predictability)
 Measures how much a particular item is dependent






on another.
Because 20,000 transactions contain nappies and
10,000 of these transactions contain beer, when
people buy nappies, they also buy beer 50% of the
time.
The confidence for the rule:
When people buy nappies they also buy beer 50% of
the time. is 50%.
Because 30,000 transactions contain beer and
10,000 of these transactions contain nappies, when
people buy beer, they also buy nappies 33.33% of
the time.
Expected Confidence
 In the absence of any knowledge about what

else was bought, we can also make the
following assertions from the available data:
 People buy nappies 4% of the time.
 People buy beer 6% of the time.
 These numbers - 4% and 6% - are called the
expected confidence of buying nappies or
beer, regardless of what else is purchased.
Lift
 Measures the ratio between the confidence of

a rule and the expected confidence that the
second product will be purchased. Lift is
measures of the strength of an effect.
 In our example, the confidence of the
nappies-beer buying rule is 50%, whilst the
expected confidence is 6% that an arbitrary
customer will buy beer. So, the lift provided
by the nappies-beer rule is :8.33 (= 50%/6%).
Forecasting
 Forecasting (unlike prediction based on classification

models) concerns the prediction of continuous
values, such a person’s income based on various
personal details, or the level of the stock market.
 Simpler forecasting problems involve a single
continuous value based on a series of unordered
examples. More complex problem is to predict one or
more values based on a sequential pattern.
 Techniques include statistical time-series analysis as
well as neural networks.
Techniques used in DM
 Regression:

This is used to map data item to a real valued
prediction variable. Ex: A college professor
wishing to calculate his future savings
 Time series analysis:
In this the value of an attribute is examined
as it varies over time. Ex: A company trying to
analyze to whom the stock can be purchased,
whether from X, Y, Z
Techniques used in DM (contd..)
 Summarization:

This maps the data into subsets with associated simple
descriptions. This is also called as characterization or
generalization. Ex: Comparison of universities in US is
the average SAT or ACT score.
 Association rules:
This is a model that identifies specific types of data
associations. Ex: a grocery store trying to decide
whether to put bread on sale.
Overview of DM
Data Mining Steps








Collect the Data
Clean the Data
Determine what is desired
Determine optimal method/tool
Mine the data
Analyze and verify the results
Use the results
Data Mining Steps (contd..)
Data Mining Input
 Data mining can effectively deal with

inconsistencies in your data. Even If your
sources are clean, integrated, and validated,
they may contain data about the real world
that is simply not true. This noise can, for
example, be caused by errors in user input or
just plain mistakes of customers filling in
questionnaires. If it does not occur too often,
data mining tools are able to ignore the noise
and still find the overall patterns that exist in
your data.
Data Mining Output
 The output of data mining can provide you with more

flexibility. For example, if you have a budget to mail
information to 1000 people about a new product,
queries or OLAP analysis directly on your data will
never be able to select exactly that number of people
from your database. By enhancing your data with an
attribute that you can use in your query or OLAP
analysis, data mining enables you to find the 1000
people most likely to respond. This example also
shows that data mining is not replacing OLAP, but
enhancing it.
The Future of Data Mining
 In the short-term, the results of data mining will be in

profitable, if mundane, business related areas. Micromarketing campaigns will explore new niches.
Advertising will target potential customers with new
precision.
 In the medium term, data mining may be as common
and easy to use as e-mail. We may use these tools to
find the best airfare to New York, root out a phone
number of a long-lost classmate, or find the best prices
on lawn mowers.
 The long-term prospects are truly exciting. Imagine
intelligent agents turned loose on medical research data
or on sub-atomic particle data. Computers may reveal
new treatments for diseases or new insights into the
nature of the universe. There are potential dangers,
though, as discussed below.
Privacy Concerns
 What if every telephone call you make, every credit card

purchase you make, every flight you take, every visit to
the doctor you make, every warranty card you send in,
every employment application you fill out, every school
record you have, your credit record, every web page you
visit ... was all collected together? A lot would be known
about you! This is an all-too-real possibility.
 In a database, too much information about too many
people for anybody is going to make any sense? Not
with data mining tools running on massively parallel
processing computers! Would you feel comfortable about
someone having access to all this data about you? And
remember, all this data does not have to reside in one
physical location; as the net grows; information of this
type becomes more available to more people.
Proposed solutions might be…
 Data are intentionally modified from their original

version, in order to misinform the recipients or for
privacy and security
 legislation designed to protect consumers against
data security failures by, among other things,
requiring companies to notify consumers when their
personal information has been compromised.
Expanding universe of data
 Nowadays, the world is regarded as an

expanding universe of data. We have an
infinite amount of data, yet little information.
Some people look at this phenomenon as a
new paradox of the growth of data, that is,
more data means less information. Therefore,
there is an urgent need for the development
of new techniques to find the required
information from huge amount of data.
Expanding universe of data
The following factors make the data mining as a
very important technique to extract implicit,
previously unknown and potentially useful
knowledge from data.
Data mining algorithms can find "optimal"
clustering or interesting regularities in a Database.
Data mining algorithms typically zoom in on
interesting sub-parts of the Databases.
Networks make it easy to connect Databases.
Machine learning techniques make it easier to
find interesting connections in Database.
Client/Server revolution.
Information as a factor of
production
Increase in available data
Exacerbated by World Wide Website
Information overload
Computer assistance to filter, select and
interpret data
 Extend this to allow computers to discover
relevant information
 In the future machine assistance will become
more and more important




Architecture of Data Mining
Components explained
 Database, data warehouse, or other

information repository: This is one or a set
of databases, data warehouses, spread
sheets, or other kinds of information
repositories. Data cleaning and data
integration techniques may be performed on
the data.
 Database or data warehouse server: The
database or data warehouse server is
responsible for fetching the relevant data,
based on the user's data mining request.
Components explained
 Knowledge base: This is the domain knowledge that

is used to guide the search, or evaluate the
interestingness of resulting patterns.
 Such knowledge can include concept hierarchies,
used to organize attributes or attribute values into
different levels of abstraction.
 Knowledge such as user beliefs, which can be used
to assess a pattern's interestingness based on its
unexpectedness, may also be included.
 Other examples of domain knowledge are additional
interestingness constraints or thresholds, and
metadata (e.g., describing data from multiple
heterogeneous sources). .
Components explained
 Data mining engine: This is essential to the data

mining system and ideally consists of a set of
functional modules for tasks such as characterization,
association analysis, classification, evolution and
deviation analysis.
Components explained
 Pattern evaluation module: This component

typically employs interestingness measures and
interacts with the data mining modules so as to focus
the search towards interesting patterns.
 It may access interestingness thresholds stored in
the knowledge base.
 Alternatively, the pattern evaluation module may be
integrated with the mining module, depending on the
implementation of the data mining method used.
Components explained
 Graphical user interface: This module

communicates between users and the data mining
system, allowing the user to interact with the system
by specifying a data mining query or task, providing
information to help focus the search, and performing
exploratory data mining based on the intermediate
data mining results.
 In addition, this component allows the user to browse
database and data warehouse schemas or data
structures, evaluate mined patterns, and visualize the
patterns in different forms.
Classification of DM
 Classification according to the kinds of

databases mined.


A data mining system can be classified
according to the kinds of databases mined.
Database systems themselves can be
classified according to different criteria (such
as data models, or the types of data or
applications involved), each of which may
require its own data mining technique. Data
mining systems can therefore be classified
accordingly.
Classification of DM
 Classification according to the kinds of databases

mined.
 For instance, if classifying according to data
models, we may have a relational, transactional,
object-oriented, object-relational, or data
warehouse mining system. If classifying according
to the special types of data handled, we may have
a spatial, time-series, text, or multimedia data
mining system, or a World-Wide Web mining
system. Other system types include
heterogeneous data mining systems, and legacy
data mining systems.
Classification of DM
 Classification according to the kinds of knowledge

mined.


Data mining systems can be categorized according to
the kinds of knowledge they mine, i.e., based on data
mining functionalities, such as characterization,
discrimination, association, classification, clustering,
trend and evolution analysis, deviation analysis,
similarity analysis, etc.
Classification of DM
 Classification according to the kinds of techniques

utilized.




These techniques can be described according to the
degree of user interaction involved (e.g., autonomous
systems, interactive exploratory systems, query-driven
systems), or the methods of data analysis employed
(e.g., database-oriented or data warehouse-oriented
techniques, machine learning, statistics, visualization,
pattern recognition, neural networks, and so on).
A sophisticated data mining system will often adopt
multiple data mining techniques or work out an
effective, integrated technique which combines the
merits of a few individual approaches.
Decision Support Systems (DSS)
 A decision support system is a computer-

based system that supports the decision
making process
• Assist decision makers in semi-structured
tasks
• Support not replace human judgment
• Highly interactive
• Improve effectiveness of human decision
makers
DSS characteristics
 Provide support in semi-structured and

unstructured situations, includes human
judgment and computerized information
 Support for various managerial levels
 Support to individuals and groups
 Support to interdependent and/or sequential
decisions
 Support all phases of the decision-making
process
 Support a variety of decision-making
processes and styles
DSS characteristics
 Are adaptive
 Have user friendly interfaces
 Goal: improve effectiveness of decision

making
 The decision maker controls the decisionmaking process
 End-users can build simple systems
 Utilizes models for analysis
 Provides access to a variety of data sources,
formats, and types
Why DSS?
• Increasing complexity of decisions
o Technology
o Information:

• “Data, data everywhere, and not the time to
think!”
o Number and complexity of options
o Pace of change
Why DSS?
• Increasing availability of computerized support
o Inexpensive high-powered computing
o Better software
o More efficient software development process

• Increasing usability of computers
o COTS (Commercial Off The Shelf) tools
o Customization
Types of Problems
• Structured
o Repetitive
o Standard solution methods exist
o Complete automation may be feasible
• Unstructured
o One-time
o No standard solutions
o Rely on judgment
o Automation is usually infeasible
• Semi-structured
o Some elements and/or phases of decision making
process have repetitive elements
Decision Support Trends
• IT is increasingly pervasive
• Users are increasingly computer savvy
• Computer hardware is increasingly smaller and more
powerful
• Systems are increasingly interconnected
• The Web is increasingly interwoven into all aspects of
our lives
• Demand for usable, flexible, powerful decision support
will continue to grow
• Decision support will be embedded into a wide variety
of consumer and business products
Humans and Computers:
Complementary Strengths
• Human decision makers
o Good at seeing patterns
o Can work with incomplete problem
representations
o Exercise subtle judgment we do not know how
to automate
o Often unaware of how they perform tasks
o Poor at integrating large numbers of cues
o Unreliable and slow at tedious bookkeeping
tasks and complex calculations
Humans and Computers:
Complementary Strengths
Computers
o Still inferior to humans at pattern recognition,
messy unstructured problems
o Good at integrating large numbers of features
o Good at tedious bookkeeping
o Rapid and accurate at complex calculations
DSS classifications
 Model Driven DSS: A model-driven DSS

emphasizes access to and manipulation of financial,
optimization and/or simulation models. Simple
quantitative models provide the most elementary
level of functionality.
 Data Driven DSS: Data-driven DSS emphasizes
access to and manipulation of a time-series of
internal company data and sometimes external and
real-time data. Simple file systems accessed by
query and retrieval tools provide the most elementary
level of functionality.
DSS classifications
 Communication Driven DSS: Communications-

driven DSS use network and communications
technologies to facilitate decision-relevant
collaboration and communication. In these systems,
communication technologies are the dominant
architectural component.
 Document Driven DSS: Document-driven DSS uses
computer storage and processing technologies to
provide document retrieval and analysis. Large
document databases may include scanned
documents, hypertext documents, images, sounds
and video.
DSS architecture
Data Management subsystem
 consists of DSS database, Database

management system, Data directory and
Query facility. It does the following
Captures/ extracts data for inclusion in a DSS
database
 Updates (adds, deletes, edits, changes) data
records and files
 Interrelates data from different sources
 Retrieves data from the database for queries
and reports

Data Management subsystem
Provides comprehensive data
security(protection from unauthorised access,
recovery capabilities, etc)
 Handles personal and unofficial data so that
users can experiment with alternative
solutions based on their own judgement
 Performs complex data manipulation tasks
based on queries
 Tracks data use within DSS
 Manages data through a data dictionary

Model Management Sub system
 consists of Analog of the database management

subsystem, Model base, Model base management
system, Modeling language, Model directory, Model
execution, integration, and command processor
 Strategic Models: Non routine mergers, impact
analysis, capital budgeting
 Tactical Models: Allocation & Control labor
requirements, sales promotion planning
 Operational Models: Routine-day-to-day
production scheduling, inventory control, quality
control
 Analytical Models: SAS, SPSS, OR, data mining
KBS
 Knowledge based Subsystem

Provides expertise in solving complex
unstructured and semi-structured problems
 Expertise provided by an expert system or
other intelligent system
 Advanced DSS have a knowledge based
(management) component
 Leads to intelligent DSS
 Example: Data mining

User interface
 User Interface sub system

Includes all communication between a user
and the MSS
 Graphical user interfaces (GUI)
 Voice recognition and speech synthesis
possible

User
 Different usage patterns for the user, the

manager, or the decision maker
Managers
 Staff specialists
 Intermediaries 1. Staff assistant 2. Expert tool
user 3. Business (system) analyst 4. GSS
Facilitator


More Related Content

PPTX
Decision Support System in MIS.pptx
rajalakshmi5921
 
PPTX
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
PPTX
Association rule mining and Apriori algorithm
hina firdaus
 
PPT
Data Warehouse and Data Mining
Ranak Ghosh
 
PPTX
Data mining
AthiraR23
 
PDF
Business Intelligence Presentation (1/2)
Bernardo Najlis
 
PDF
CS8080 IRT UNIT I NOTES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
PPTX
Big data analytics
Dr.Bhuvaneswari Velumani
 
Decision Support System in MIS.pptx
rajalakshmi5921
 
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
Association rule mining and Apriori algorithm
hina firdaus
 
Data Warehouse and Data Mining
Ranak Ghosh
 
Data mining
AthiraR23
 
Business Intelligence Presentation (1/2)
Bernardo Najlis
 
Big data analytics
Dr.Bhuvaneswari Velumani
 

What's hot (20)

PPTX
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
PPTX
Data mart
Prachi Agarwal
 
PPT
Data warehousing and online analytical processing
VijayasankariS
 
PDF
Data science workshop
Hortonworks
 
PPTX
MIS and Digital Firms
Pulkit Sharma
 
PDF
Google Stock Price Forecasting
Arkaprava Kundu
 
PPTX
data warehouse , data mart, etl
Aashish Rathod
 
PDF
An incremental mining algorithm for maintaining sequential patterns using pre...
Editor IJMTER
 
PPTX
Introduction to data mining technique
Pawneshwar Datt Rai
 
PPTX
management information system module3
Shifas ibrahim MBA student @ ILAHIA SCHOOL OF MANAGEMENT STUDIES
 
PPT
Mis
anujrai001
 
ODP
Introduction To Analytics
Alex Meadows
 
PPTX
Big data ppt
Deepika ParthaSarathy
 
PPTX
Overview of Big data(ppt)
Shatavisha Roy Chowdhury
 
PPTX
Mis structure of digital firm
Dr. Vardhan choubey
 
PPTX
3 pillars of big data : structured data, semi structured data and unstructure...
PROWEBSCRAPER
 
PDF
Bi 3
shivz3
 
PPTX
Big data ppt
Nasrin Hussain
 
PPTX
Data mining presentation.ppt
neelamoberoi1030
 
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
Data mart
Prachi Agarwal
 
Data warehousing and online analytical processing
VijayasankariS
 
Data science workshop
Hortonworks
 
MIS and Digital Firms
Pulkit Sharma
 
Google Stock Price Forecasting
Arkaprava Kundu
 
data warehouse , data mart, etl
Aashish Rathod
 
An incremental mining algorithm for maintaining sequential patterns using pre...
Editor IJMTER
 
Introduction to data mining technique
Pawneshwar Datt Rai
 
Introduction To Analytics
Alex Meadows
 
Big data ppt
Deepika ParthaSarathy
 
Overview of Big data(ppt)
Shatavisha Roy Chowdhury
 
Mis structure of digital firm
Dr. Vardhan choubey
 
3 pillars of big data : structured data, semi structured data and unstructure...
PROWEBSCRAPER
 
Bi 3
shivz3
 
Big data ppt
Nasrin Hussain
 
Data mining presentation.ppt
neelamoberoi1030
 
Ad

Viewers also liked (20)

PPTX
Decision Support System(DSS)
Sayantan Sur
 
PDF
Decision support systems and business intelligence
Shwetabh Jaiswal
 
PPTX
Decision Support System - Management Information System
Nijaz N
 
PPT
Decision Support Systems
luzenith_g
 
PPT
Decision Support System
paramalways
 
PPT
decision support system
sayivc
 
PPT
Decision Support Systems
Shigem
 
PPTX
Decision support system
Bhuwneshwar Pandaya
 
PPTX
Decision support systems and business intelligence
Shwetabh Jaiswal
 
PPTX
BI and DSS
sarah hamed
 
PPTX
Decision Support System
Awais Alam
 
PPT
Types of decision support system
nripeshkumarnrip
 
PDF
Data Visualization Techniques
AllAnalytics
 
PPT
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Salah Amean
 
PDF
The Full Stack Java Developer - Josh Long
JAXLondon2014
 
PPT
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
PPT
Datawarehousing & DSS
Deepali Raut
 
PPT
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Decision Support System(DSS)
Sayantan Sur
 
Decision support systems and business intelligence
Shwetabh Jaiswal
 
Decision Support System - Management Information System
Nijaz N
 
Decision Support Systems
luzenith_g
 
Decision Support System
paramalways
 
decision support system
sayivc
 
Decision Support Systems
Shigem
 
Decision support system
Bhuwneshwar Pandaya
 
Decision support systems and business intelligence
Shwetabh Jaiswal
 
BI and DSS
sarah hamed
 
Decision Support System
Awais Alam
 
Types of decision support system
nripeshkumarnrip
 
Data Visualization Techniques
AllAnalytics
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Salah Amean
 
The Full Stack Java Developer - Josh Long
JAXLondon2014
 
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Datawarehousing & DSS
Deepali Raut
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Ad

Similar to Data mining techniques and dss (20)

PPT
Datamining
IssacArputharajJeyak
 
PPT
Datamining
IssacArputharajJeyak
 
PDF
turban_dss9e_Data Mining-Decision Support and Business Intelligence.pdf
ikachanz
 
PPTX
Data mining concepts and work
Amr Abd El Latief
 
PPTX
Seminar Presentation
Vaibhav Dhattarwal
 
DOC
An introduction to data mining
Shiva Krishna Chandra Shekar
 
PPT
datamining.ppt
ssusereadde9
 
PPT
datamining.ppt
PerumalPitchandi
 
PPTX
datamining management slyabbus and ppt.pptx
shyam1985
 
PPT
Data mining
Cloudbells.com
 
PPT
Data mining
pradeepa n
 
PDF
Introduction to feature subset selection method
IJSRD
 
PDF
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
PDF
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
PPT
Data mining and its concepts
Bharadwaj Sharma
 
PDF
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
zmojab
 
PPTX
01 Introduction to Data Mining
Valerii Klymchuk
 
PPTX
Data mining
SumitMuley2
 
PPTX
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
 
PPTX
sharda_dss10e_pp_ch05_NEW_ST. decesion supportpptx
layanorg
 
turban_dss9e_Data Mining-Decision Support and Business Intelligence.pdf
ikachanz
 
Data mining concepts and work
Amr Abd El Latief
 
Seminar Presentation
Vaibhav Dhattarwal
 
An introduction to data mining
Shiva Krishna Chandra Shekar
 
datamining.ppt
ssusereadde9
 
datamining.ppt
PerumalPitchandi
 
datamining management slyabbus and ppt.pptx
shyam1985
 
Data mining
Cloudbells.com
 
Data mining
pradeepa n
 
Introduction to feature subset selection method
IJSRD
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
Data mining and its concepts
Bharadwaj Sharma
 
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
zmojab
 
01 Introduction to Data Mining
Valerii Klymchuk
 
Data mining
SumitMuley2
 
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
 
sharda_dss10e_pp_ch05_NEW_ST. decesion supportpptx
layanorg
 

More from Niyitegekabilly (7)

PPT
Introduction to knowledge management
Niyitegekabilly
 
PPT
Data mining introduction
Niyitegekabilly
 
PPT
Data wirehouse
Niyitegekabilly
 
PPT
Introduction to knowledge management
Niyitegekabilly
 
PPTX
JAVA PROGRAMMINGD
Niyitegekabilly
 
PPTX
Birasa 1
Niyitegekabilly
 
PPTX
JAVA PROGRAMMING
Niyitegekabilly
 
Introduction to knowledge management
Niyitegekabilly
 
Data mining introduction
Niyitegekabilly
 
Data wirehouse
Niyitegekabilly
 
Introduction to knowledge management
Niyitegekabilly
 
JAVA PROGRAMMINGD
Niyitegekabilly
 
Birasa 1
Niyitegekabilly
 
JAVA PROGRAMMING
Niyitegekabilly
 

Recently uploaded (20)

PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
The Future of Artificial Intelligence (AI)
Mukul
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 

Data mining techniques and dss

  • 2. Myth #1:Data mining provides instant crystal ball-predictions Data mining is neither a crystal ball nor a technology where answers magically appear after pushing a single button. It's a multi-step process that includes: defining the business problem, exploring and conditioning data, developing the model, and deploying the knowledge gained. Typically, companies spend the bulk of their time preprocessing and conditioning the data to make sure it is clean, consistent, and combined properly to deliver business intelligence on which they can rely. Data mining is all about the data -- successful data mining requires data that accurately reflects the business.
  • 3. Myth #2: Data mining is not yet viable for business application Data mining is viable technology and highly prized for its business results. The myth tends to be perpetrated by those who need to explain why they are not yet using the process and revolves around two related statements.
  • 4. Myth #3: Data mining requires a data warehouse  It is true that data mining can benefit from warehoused data that is well organized, relatively clean, and easy to access.  This is particularly true if the warehouse has been constructed with data mining specifically in mind and with knowledge of the requirements of the data mining project.  However, the warehoused data may be less useful for data mining than the source or operational data. In the worst case, warehoused data may be completely useless (for example, if only summary data are stored).
  • 5. Myth #4: DM is all about algorithms  People often misunderstood that "All you need for data mining is good algorithms. The better your algorithms, the better your data mining; advancing the effectiveness of data mining means advancing our knowledge of algorithms.“  This is often to misunderstand the data mining process. Data mining is a process consisting of many elements, such as formulating business goals, mapping business goals to data mining goals, acquiring, understanding, and pre-processing the data, evaluating and presenting the results of analysis and deploying these results to achieve business benefits.  This is not to minimize the importance of new or improved data mining algorithms
  • 6. Myth #5: DM should be done by technology expert  Quite the opposite is true, due to the paramount importance of business knowledge in data mining.  When performed without business knowledge, data mining can produce nonsensical or useless results so it is essential that data mining be performed by someone with extensive knowledge of the business problem.  Very seldom is this the same person with extensive knowledge of the data mining technology. It is the responsibility of data mining tool providers to ensure that tools are accessible to business users.
  • 7. Myth #6: Data mining is for large companies with lots of customer data  The plain fact is that if a company, large or small, has data that accurately reflects the business or its customers, it can build models against that data that lend insights into important business challenges. The amount of customer data a company possesses has never been the issue.
  • 8. Fundamental concepts of DM  Classification Classification is the operation most commonly supported by commercial data mining tools.  It is the process of sub-dividing a data set with regard to a number of specific outcomes.  For example, classifying customers into ‘high’ and ‘low’ categories with regard to credit risk.  The category or ‘class’ into which each customer is placed is the ‘outcome’ of the classification. 
  • 9. Prediction  Prediction gives the future data states based on past and current data. Prediction can be viewed as a type of classification. Ex: Predicting floods  Techniques for Classification and prediction decision trees, neural networks, nearest neighbour algorithms
  • 10. Understanding v Prediction  Sophisticated classification techniques enable us to discover new patterns in large and complex data sets.  Classification is a powerful aid to understanding a particular problem. In some cases, improved understanding is sufficient. It may suggest new initiatives and provide information that improves future decision making.  Often the reason for developing an accurate classification model is to improve our capability for prediction.
  • 11. Training  A classification model is said to be ‘trained’ on historical data, for which the outcome is known for each record.  But beware over fitting: for example100 per cent of customers called Smith who live at 28 Arcadia Street responded to the offer.  One would then use a separate test dataset of historical data to validate the model.  The model could then be applied to a new, unclassified data set in order to predict the outcome for each record.
  • 12. Clustering  It is used to find groupings of similar records in a data set without any preconditions as to what that similarity may involve.  Clustering is used to identify interesting groups in a customer base that may not have been recognised before. Often undertaken as an exploratory exercise before doing further data mining using a classification technique.  Techniques for Clustering - cluster analysis, neural networks
  • 13. Association analysis  Association analysis looks for links between records in a data set.  Sometimes referred to as ‘market basket analysis’, its most common aim is to discover which items are generally purchased at the same time.
  • 14. Example of Association Analysis  Consider the following beer and nappy example:  500,000 transactions  20,000 transactions contain nappies (4%)  30,000 transactions contain beer (6%)  10,000 transactions contain both nappies and beer (2%)
  • 15. Sequential analysis  Sequential analysis looks for temporal links between purchases, rather than relationships between items in a single transaction.
  • 16. Support (or prevalence)  Measures how often items occur together, as a percentage of the total transactions. In this example, beer and nappies occur together 2% of the time (10,000/500,000).
  • 17. Confidence (or predictability)  Measures how much a particular item is dependent     on another. Because 20,000 transactions contain nappies and 10,000 of these transactions contain beer, when people buy nappies, they also buy beer 50% of the time. The confidence for the rule: When people buy nappies they also buy beer 50% of the time. is 50%. Because 30,000 transactions contain beer and 10,000 of these transactions contain nappies, when people buy beer, they also buy nappies 33.33% of the time.
  • 18. Expected Confidence  In the absence of any knowledge about what else was bought, we can also make the following assertions from the available data:  People buy nappies 4% of the time.  People buy beer 6% of the time.  These numbers - 4% and 6% - are called the expected confidence of buying nappies or beer, regardless of what else is purchased.
  • 19. Lift  Measures the ratio between the confidence of a rule and the expected confidence that the second product will be purchased. Lift is measures of the strength of an effect.  In our example, the confidence of the nappies-beer buying rule is 50%, whilst the expected confidence is 6% that an arbitrary customer will buy beer. So, the lift provided by the nappies-beer rule is :8.33 (= 50%/6%).
  • 20. Forecasting  Forecasting (unlike prediction based on classification models) concerns the prediction of continuous values, such a person’s income based on various personal details, or the level of the stock market.  Simpler forecasting problems involve a single continuous value based on a series of unordered examples. More complex problem is to predict one or more values based on a sequential pattern.  Techniques include statistical time-series analysis as well as neural networks.
  • 21. Techniques used in DM  Regression: This is used to map data item to a real valued prediction variable. Ex: A college professor wishing to calculate his future savings  Time series analysis: In this the value of an attribute is examined as it varies over time. Ex: A company trying to analyze to whom the stock can be purchased, whether from X, Y, Z
  • 22. Techniques used in DM (contd..)  Summarization: This maps the data into subsets with associated simple descriptions. This is also called as characterization or generalization. Ex: Comparison of universities in US is the average SAT or ACT score.  Association rules: This is a model that identifies specific types of data associations. Ex: a grocery store trying to decide whether to put bread on sale.
  • 24. Data Mining Steps        Collect the Data Clean the Data Determine what is desired Determine optimal method/tool Mine the data Analyze and verify the results Use the results
  • 25. Data Mining Steps (contd..)
  • 26. Data Mining Input  Data mining can effectively deal with inconsistencies in your data. Even If your sources are clean, integrated, and validated, they may contain data about the real world that is simply not true. This noise can, for example, be caused by errors in user input or just plain mistakes of customers filling in questionnaires. If it does not occur too often, data mining tools are able to ignore the noise and still find the overall patterns that exist in your data.
  • 27. Data Mining Output  The output of data mining can provide you with more flexibility. For example, if you have a budget to mail information to 1000 people about a new product, queries or OLAP analysis directly on your data will never be able to select exactly that number of people from your database. By enhancing your data with an attribute that you can use in your query or OLAP analysis, data mining enables you to find the 1000 people most likely to respond. This example also shows that data mining is not replacing OLAP, but enhancing it.
  • 28. The Future of Data Mining  In the short-term, the results of data mining will be in profitable, if mundane, business related areas. Micromarketing campaigns will explore new niches. Advertising will target potential customers with new precision.  In the medium term, data mining may be as common and easy to use as e-mail. We may use these tools to find the best airfare to New York, root out a phone number of a long-lost classmate, or find the best prices on lawn mowers.  The long-term prospects are truly exciting. Imagine intelligent agents turned loose on medical research data or on sub-atomic particle data. Computers may reveal new treatments for diseases or new insights into the nature of the universe. There are potential dangers, though, as discussed below.
  • 29. Privacy Concerns  What if every telephone call you make, every credit card purchase you make, every flight you take, every visit to the doctor you make, every warranty card you send in, every employment application you fill out, every school record you have, your credit record, every web page you visit ... was all collected together? A lot would be known about you! This is an all-too-real possibility.  In a database, too much information about too many people for anybody is going to make any sense? Not with data mining tools running on massively parallel processing computers! Would you feel comfortable about someone having access to all this data about you? And remember, all this data does not have to reside in one physical location; as the net grows; information of this type becomes more available to more people.
  • 30. Proposed solutions might be…  Data are intentionally modified from their original version, in order to misinform the recipients or for privacy and security  legislation designed to protect consumers against data security failures by, among other things, requiring companies to notify consumers when their personal information has been compromised.
  • 31. Expanding universe of data  Nowadays, the world is regarded as an expanding universe of data. We have an infinite amount of data, yet little information. Some people look at this phenomenon as a new paradox of the growth of data, that is, more data means less information. Therefore, there is an urgent need for the development of new techniques to find the required information from huge amount of data.
  • 32. Expanding universe of data The following factors make the data mining as a very important technique to extract implicit, previously unknown and potentially useful knowledge from data. Data mining algorithms can find "optimal" clustering or interesting regularities in a Database. Data mining algorithms typically zoom in on interesting sub-parts of the Databases. Networks make it easy to connect Databases. Machine learning techniques make it easier to find interesting connections in Database. Client/Server revolution.
  • 33. Information as a factor of production Increase in available data Exacerbated by World Wide Website Information overload Computer assistance to filter, select and interpret data  Extend this to allow computers to discover relevant information  In the future machine assistance will become more and more important    
  • 35. Components explained  Database, data warehouse, or other information repository: This is one or a set of databases, data warehouses, spread sheets, or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data.  Database or data warehouse server: The database or data warehouse server is responsible for fetching the relevant data, based on the user's data mining request.
  • 36. Components explained  Knowledge base: This is the domain knowledge that is used to guide the search, or evaluate the interestingness of resulting patterns.  Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction.  Knowledge such as user beliefs, which can be used to assess a pattern's interestingness based on its unexpectedness, may also be included.  Other examples of domain knowledge are additional interestingness constraints or thresholds, and metadata (e.g., describing data from multiple heterogeneous sources). .
  • 37. Components explained  Data mining engine: This is essential to the data mining system and ideally consists of a set of functional modules for tasks such as characterization, association analysis, classification, evolution and deviation analysis.
  • 38. Components explained  Pattern evaluation module: This component typically employs interestingness measures and interacts with the data mining modules so as to focus the search towards interesting patterns.  It may access interestingness thresholds stored in the knowledge base.  Alternatively, the pattern evaluation module may be integrated with the mining module, depending on the implementation of the data mining method used.
  • 39. Components explained  Graphical user interface: This module communicates between users and the data mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing exploratory data mining based on the intermediate data mining results.  In addition, this component allows the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms.
  • 40. Classification of DM  Classification according to the kinds of databases mined.  A data mining system can be classified according to the kinds of databases mined. Database systems themselves can be classified according to different criteria (such as data models, or the types of data or applications involved), each of which may require its own data mining technique. Data mining systems can therefore be classified accordingly.
  • 41. Classification of DM  Classification according to the kinds of databases mined.  For instance, if classifying according to data models, we may have a relational, transactional, object-oriented, object-relational, or data warehouse mining system. If classifying according to the special types of data handled, we may have a spatial, time-series, text, or multimedia data mining system, or a World-Wide Web mining system. Other system types include heterogeneous data mining systems, and legacy data mining systems.
  • 42. Classification of DM  Classification according to the kinds of knowledge mined.  Data mining systems can be categorized according to the kinds of knowledge they mine, i.e., based on data mining functionalities, such as characterization, discrimination, association, classification, clustering, trend and evolution analysis, deviation analysis, similarity analysis, etc.
  • 43. Classification of DM  Classification according to the kinds of techniques utilized.   These techniques can be described according to the degree of user interaction involved (e.g., autonomous systems, interactive exploratory systems, query-driven systems), or the methods of data analysis employed (e.g., database-oriented or data warehouse-oriented techniques, machine learning, statistics, visualization, pattern recognition, neural networks, and so on). A sophisticated data mining system will often adopt multiple data mining techniques or work out an effective, integrated technique which combines the merits of a few individual approaches.
  • 44. Decision Support Systems (DSS)  A decision support system is a computer- based system that supports the decision making process • Assist decision makers in semi-structured tasks • Support not replace human judgment • Highly interactive • Improve effectiveness of human decision makers
  • 45. DSS characteristics  Provide support in semi-structured and unstructured situations, includes human judgment and computerized information  Support for various managerial levels  Support to individuals and groups  Support to interdependent and/or sequential decisions  Support all phases of the decision-making process  Support a variety of decision-making processes and styles
  • 46. DSS characteristics  Are adaptive  Have user friendly interfaces  Goal: improve effectiveness of decision making  The decision maker controls the decisionmaking process  End-users can build simple systems  Utilizes models for analysis  Provides access to a variety of data sources, formats, and types
  • 47. Why DSS? • Increasing complexity of decisions o Technology o Information: • “Data, data everywhere, and not the time to think!” o Number and complexity of options o Pace of change
  • 48. Why DSS? • Increasing availability of computerized support o Inexpensive high-powered computing o Better software o More efficient software development process • Increasing usability of computers o COTS (Commercial Off The Shelf) tools o Customization
  • 49. Types of Problems • Structured o Repetitive o Standard solution methods exist o Complete automation may be feasible • Unstructured o One-time o No standard solutions o Rely on judgment o Automation is usually infeasible • Semi-structured o Some elements and/or phases of decision making process have repetitive elements
  • 50. Decision Support Trends • IT is increasingly pervasive • Users are increasingly computer savvy • Computer hardware is increasingly smaller and more powerful • Systems are increasingly interconnected • The Web is increasingly interwoven into all aspects of our lives • Demand for usable, flexible, powerful decision support will continue to grow • Decision support will be embedded into a wide variety of consumer and business products
  • 51. Humans and Computers: Complementary Strengths • Human decision makers o Good at seeing patterns o Can work with incomplete problem representations o Exercise subtle judgment we do not know how to automate o Often unaware of how they perform tasks o Poor at integrating large numbers of cues o Unreliable and slow at tedious bookkeeping tasks and complex calculations
  • 52. Humans and Computers: Complementary Strengths Computers o Still inferior to humans at pattern recognition, messy unstructured problems o Good at integrating large numbers of features o Good at tedious bookkeeping o Rapid and accurate at complex calculations
  • 53. DSS classifications  Model Driven DSS: A model-driven DSS emphasizes access to and manipulation of financial, optimization and/or simulation models. Simple quantitative models provide the most elementary level of functionality.  Data Driven DSS: Data-driven DSS emphasizes access to and manipulation of a time-series of internal company data and sometimes external and real-time data. Simple file systems accessed by query and retrieval tools provide the most elementary level of functionality.
  • 54. DSS classifications  Communication Driven DSS: Communications- driven DSS use network and communications technologies to facilitate decision-relevant collaboration and communication. In these systems, communication technologies are the dominant architectural component.  Document Driven DSS: Document-driven DSS uses computer storage and processing technologies to provide document retrieval and analysis. Large document databases may include scanned documents, hypertext documents, images, sounds and video.
  • 56. Data Management subsystem  consists of DSS database, Database management system, Data directory and Query facility. It does the following Captures/ extracts data for inclusion in a DSS database  Updates (adds, deletes, edits, changes) data records and files  Interrelates data from different sources  Retrieves data from the database for queries and reports 
  • 57. Data Management subsystem Provides comprehensive data security(protection from unauthorised access, recovery capabilities, etc)  Handles personal and unofficial data so that users can experiment with alternative solutions based on their own judgement  Performs complex data manipulation tasks based on queries  Tracks data use within DSS  Manages data through a data dictionary 
  • 58. Model Management Sub system  consists of Analog of the database management subsystem, Model base, Model base management system, Modeling language, Model directory, Model execution, integration, and command processor  Strategic Models: Non routine mergers, impact analysis, capital budgeting  Tactical Models: Allocation & Control labor requirements, sales promotion planning  Operational Models: Routine-day-to-day production scheduling, inventory control, quality control  Analytical Models: SAS, SPSS, OR, data mining
  • 59. KBS  Knowledge based Subsystem Provides expertise in solving complex unstructured and semi-structured problems  Expertise provided by an expert system or other intelligent system  Advanced DSS have a knowledge based (management) component  Leads to intelligent DSS  Example: Data mining 
  • 60. User interface  User Interface sub system Includes all communication between a user and the MSS  Graphical user interfaces (GUI)  Voice recognition and speech synthesis possible 
  • 61. User  Different usage patterns for the user, the manager, or the decision maker Managers  Staff specialists  Intermediaries 1. Staff assistant 2. Expert tool user 3. Business (system) analyst 4. GSS Facilitator 