2
Most read
3
Most read
7
Most read
DATA REDUCTION STRATEGIES
DATA CUBE AGGREGATION
ATTRIBUTE SUBSET SELECTION
Why data reduction?
 Huge amount of data is being created day by day.
 Development of big data platform.
 Poor performance of old algorithms.
 Most of the data mining algorithms are column wise implemented.
 Pushed for data reduction procedures.
What is data reduction?
Data reduction is a process that reduced the volume of
original data and represents it in a much smaller volume.
 It maintains the integrity of the data while reducing.
 The time required for data reduction should not overshadow the the time
saved by data mining on the reduced data set.
 Data reduction does not affect the result obtained from data mining.
 Data reduction increases the efficiency of data mining.
Data reduction strategies
1. Data cube aggregation
2. Attribute subset selection
3. Dimensionality reduction
4. Numerosity reduction
5. Discretization and concept hierarchy generation
Data Cube Aggregation
This technique is used to aggregate
(combine) data in a simpler form. So we can
summarize the data in such a way that the data is
used as result
Data Cube Aggregation
The data is given of states and their profit earned in
dollars for selling laptops in each country in
different tables by each state .
States Gross Profit($)
Arizona 500
Texas 320
Illanoid 430
States Gross Profit($)
Kerala 245
Tamil Nadu 380
Goa 950
States Gross Profit($)
Alberta 420
Manitoba 200
Ontario 300
Country Gross Profit($)
USA 1250
India 1575
Canada 920
Country
USA
Country
Canada
Country
India
Attribute Subset Selection
From a large number of attributes a minimal
attribute set is being reduced by eliminating
the irrelevant attributes that may not much
affect the data. Mining of reduced data
makes it easier to understand.
Methods of Attribute Subset Selection are:
1. Stepwise Forward Selection- It starts with an empty set and add the
relevant attributes ignoring the rest.
2. Step-wise backward elimination –It starts with full set and removes
the irrelevant attributes keeping the rest.
3. Combining forward selection and backward elimination-select the
best and removes the worst
4. Decision-tree induction-It is a flowchart like structure to choose best
attribute to partition data.
Example
A data set is given from which we need to segregate the
number of male, female and transgender individuals who are
eligible for voting.
Initial Attribute Set={ Name, Age, Gender, Address, Phone}
Forward Selection
 Initial attribute set={ Name, Age, Gender, Address, Phone}
 Initial Reduced Set =>{ }
 =>{Age}
 =>{Age, Gender}
 Reduced attribute set =>{Age ,Gender}
Backward Elimination
 Initial Attribute Set=> { Name, Age, Gender, Address, Phone }
 Initial Reduced Set=> { Name, Age, Gender, Address, Phone }
 => { Age, Gender, Address, Phone }
 => { Age, Gender, Phone }
 => { Age, Gender }
 Reduced Attribute Set=> { Age, Gender }
Decision Tree Induction
Initial attribute={Name,Age,Gender,Address,Phone}
Age
Not a
voter
Gender
Male Female T.Gender
>=18
<18
Reduced attribute set={Age ,Gender}
Thank You
Ananthakrishnan P.G.
Anjali Soorej
Ann Mary Sajan

More Related Content

PPTX
Data Reduction
PPTX
Data reduction
PPT
1.7 data reduction
PPTX
Clustering in Data Mining
PPTX
3 Data Mining Tasks
PPT
Mining Frequent Patterns, Association and Correlations
PPTX
Ensemble learning
PPTX
Classification in data mining
Data Reduction
Data reduction
1.7 data reduction
Clustering in Data Mining
3 Data Mining Tasks
Mining Frequent Patterns, Association and Correlations
Ensemble learning
Classification in data mining

What's hot (20)

PDF
Data preprocessing using Machine Learning
PPTX
Data Mining: Classification and analysis
PPT
2.3 bayesian classification
PPT
PPTX
Association rule mining.pptx
PPTX
04 Classification in Data Mining
PPT
1.8 discretization
PDF
Mining Frequent Patterns And Association Rules
PPTX
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
PPT
Data preprocessing
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PPT
Clustering
PPTX
Learning Method In Data Mining
PPTX
Data preprocessing in Machine learning
PPTX
Major issues in data mining
PDF
Big data Analytics
PPTX
Attribute oriented analysis
PPTX
Classification techniques in data mining
PPT
3.2 partitioning methods
PPT
Association rule mining
Data preprocessing using Machine Learning
Data Mining: Classification and analysis
2.3 bayesian classification
Association rule mining.pptx
04 Classification in Data Mining
1.8 discretization
Mining Frequent Patterns And Association Rules
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Data preprocessing
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Clustering
Learning Method In Data Mining
Data preprocessing in Machine learning
Major issues in data mining
Big data Analytics
Attribute oriented analysis
Classification techniques in data mining
3.2 partitioning methods
Association rule mining
Ad

Similar to Data Reduction Stratergies (20)

PPTX
DMW.pptx
PPTX
Dimension reduction(jiten01)
PPT
Data preprocessing 2
PPT
Data1
PPT
Data1
PPT
Data preperation
PPT
Data preperation
PPT
Data preparation
PPT
Data preparation
PPT
Data preparation
PPT
Data preparation
PPT
Data preperation
PDF
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
PPTX
data mining is the process of data reduction in the field of data mining
PPT
Datapreprocessingppt
PPT
Data preprocessing in Data Mining
PPT
Data Mining
PPTX
Data preprocessing
PPTX
Data Preprocessing
PPTX
Data preprocessing
DMW.pptx
Dimension reduction(jiten01)
Data preprocessing 2
Data1
Data1
Data preperation
Data preperation
Data preparation
Data preparation
Data preparation
Data preparation
Data preperation
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
data mining is the process of data reduction in the field of data mining
Datapreprocessingppt
Data preprocessing in Data Mining
Data Mining
Data preprocessing
Data Preprocessing
Data preprocessing
Ad

Recently uploaded (20)

PPTX
transformers as a tool for understanding advance algorithms in deep learning
PPTX
Hushh Hackathon for IIT Bombay: Create your very own Agents
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
Stats annual compiled ipd opd ot br 2024
PDF
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
PPTX
ch20 Database System Architecture by Rizvee
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
Chapter security of computer_8_v8.1.pptx
PPTX
DATA ANALYTICS COURSE IN PITAMPURA.pptx
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
inbound2857676998455010149.pptxmmmmmmmmm
PPTX
GPS sensor used agriculture land for automation
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
Machine Learning and working of machine Learning
PPTX
ai agent creaction with langgraph_presentation_
PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PPTX
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx
PPTX
Introduction to Fundamentals of Data Security
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
transformers as a tool for understanding advance algorithms in deep learning
Hushh Hackathon for IIT Bombay: Create your very own Agents
Session 11 - Data Visualization Storytelling (2).pdf
Stats annual compiled ipd opd ot br 2024
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
ch20 Database System Architecture by Rizvee
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
Chapter security of computer_8_v8.1.pptx
DATA ANALYTICS COURSE IN PITAMPURA.pptx
The Role of Pathology AI in Translational Cancer Research and Education
inbound2857676998455010149.pptxmmmmmmmmm
GPS sensor used agriculture land for automation
expt-design-lecture-12 hghhgfggjhjd (1).ppt
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
Machine Learning and working of machine Learning
ai agent creaction with langgraph_presentation_
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx
Introduction to Fundamentals of Data Security
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf

Data Reduction Stratergies

  • 1. DATA REDUCTION STRATEGIES DATA CUBE AGGREGATION ATTRIBUTE SUBSET SELECTION
  • 2. Why data reduction?  Huge amount of data is being created day by day.  Development of big data platform.  Poor performance of old algorithms.  Most of the data mining algorithms are column wise implemented.  Pushed for data reduction procedures.
  • 3. What is data reduction? Data reduction is a process that reduced the volume of original data and represents it in a much smaller volume.  It maintains the integrity of the data while reducing.  The time required for data reduction should not overshadow the the time saved by data mining on the reduced data set.  Data reduction does not affect the result obtained from data mining.  Data reduction increases the efficiency of data mining.
  • 4. Data reduction strategies 1. Data cube aggregation 2. Attribute subset selection 3. Dimensionality reduction 4. Numerosity reduction 5. Discretization and concept hierarchy generation
  • 5. Data Cube Aggregation This technique is used to aggregate (combine) data in a simpler form. So we can summarize the data in such a way that the data is used as result
  • 6. Data Cube Aggregation The data is given of states and their profit earned in dollars for selling laptops in each country in different tables by each state .
  • 7. States Gross Profit($) Arizona 500 Texas 320 Illanoid 430 States Gross Profit($) Kerala 245 Tamil Nadu 380 Goa 950 States Gross Profit($) Alberta 420 Manitoba 200 Ontario 300 Country Gross Profit($) USA 1250 India 1575 Canada 920 Country USA Country Canada Country India
  • 8. Attribute Subset Selection From a large number of attributes a minimal attribute set is being reduced by eliminating the irrelevant attributes that may not much affect the data. Mining of reduced data makes it easier to understand.
  • 9. Methods of Attribute Subset Selection are: 1. Stepwise Forward Selection- It starts with an empty set and add the relevant attributes ignoring the rest. 2. Step-wise backward elimination –It starts with full set and removes the irrelevant attributes keeping the rest. 3. Combining forward selection and backward elimination-select the best and removes the worst 4. Decision-tree induction-It is a flowchart like structure to choose best attribute to partition data.
  • 10. Example A data set is given from which we need to segregate the number of male, female and transgender individuals who are eligible for voting. Initial Attribute Set={ Name, Age, Gender, Address, Phone}
  • 11. Forward Selection  Initial attribute set={ Name, Age, Gender, Address, Phone}  Initial Reduced Set =>{ }  =>{Age}  =>{Age, Gender}  Reduced attribute set =>{Age ,Gender}
  • 12. Backward Elimination  Initial Attribute Set=> { Name, Age, Gender, Address, Phone }  Initial Reduced Set=> { Name, Age, Gender, Address, Phone }  => { Age, Gender, Address, Phone }  => { Age, Gender, Phone }  => { Age, Gender }  Reduced Attribute Set=> { Age, Gender }
  • 13. Decision Tree Induction Initial attribute={Name,Age,Gender,Address,Phone} Age Not a voter Gender Male Female T.Gender >=18 <18 Reduced attribute set={Age ,Gender}
  • 14. Thank You Ananthakrishnan P.G. Anjali Soorej Ann Mary Sajan