SlideShare a Scribd company logo
3
Most read
4
Most read
7
Most read
PRESENTED BY
R.RAMADEVI
I . M SC (CS & IT)
NADAR SARASWATHI COLLEGE OF ARTS & SCIENCE
THENI.
DATA GENERALIZATION AND SUMMARIZATION-BASED
CHARACTERIZED
DATA GENERALIZATION AND SUMMARIZATION-
BASED CHARACTERIZATION
 Data and objects in database often contain detailed information at primitive
concept levels
FOR EXAMPLE:
The item relation in a sales database may contain attributes describing low-
level item information such as item _ id ,name ,brand , category ,supplier , place
_ made and price
 This requries an important functionality in data mining :data generalization
DATA GENERALIZATION
 DATA GENERALIZATION is a process that abstracts a large set of task
relevant data in a database from a relatively low conceptual level to hight
conceptual levels
 The generalization of large data sets can be categorized according to two
approaches
(1)The data cube (or OLAP)approach
(2)The attribute-oriented induction approach
ATTRIBUTE-ORIENTED INDUCTION
 The attribute-oriented induction(AOI) approach to data generalization and
summarization _ based characterization was first proposed in 1989
 The data cube approach can be considered as a data warehouse based
precomputation oriented materialized _ view approach
 It performs off _ line aggregation before an OLAP or data mining query is
submitted for processing
 The attribute oriented induction approach , a relation database query oriented ,
generalization-based ,on-line data analysis technique
ATTRIBUTE-ORIENTED INDUCTION
 Some aggregation in the data cube can be computed on – line
 While off – line precomputation of multidimensional space can speed up
attribute – oriented induction as well
 To first collect the task _ relevant data using a relational database query and
then perform generalization based on the examination of the number of distinct
value of each attributer in the relevant set of data
EXAMPLE
Specifying a data mining query for
characterization with DMQL:
 Suppose that a user would like to describe the general characteristics of graduate
students in the BIG _ UNIVERSITY
 The attributes (name ,gender ,major , birth _ place ,
birth _ data , phone no & gpa
use Big _ university _ DB
mine characteristics as “science _ students”
in relevant to name , gander , major , birth place , birth date , phone no ,gpa
from student
where status in “ graduate “
TRANSFORMING A DATA MINING QUERY TO A
RELATIONAL QUERY
 The transformed query is executed against the relational data base
 Big university DB and return the data show
 This table on which induction will be perfomed
use Big _ university _ DB
select name , gander , major , birth place , birth date , phone no ,gpa
from student
where status in [ “M.SC”, “ M.A ”,” M.B.A ., ”,” Ph.D”]
DATA GENERALIZATION TWO TYPES
ATTRIBUTES REMOVED:
 If there is a large set of distinct values for an attributes of the initial working
relation
(1)There is no generalization operator on the attributes
(2)Its higher level concept are expressed in terms of other attributes
ATTRIBUTES GENERALIZATION
 If there is a large set of distinct values for an attributes in the initial working
relation and there exists a set of generalization operation on the attributes
 This corresponds to the generalization rule known as climbing generalization
trees in learning example or concept tree ascension
First technique: called attributes generalization threshold control
second technique : called generalization
relation threshold control
ATTRIBUTE – ORIENTED INDUCTION
For each attributes of the relation the generalization proceeds as follows:
1.name:the large number of distinct values for gender , no generalization operation
defined attributes is removed
2.gender:There are two distinct values , the attributes is retained
3.major:support the concept hierarchy has be defined the attributes major to
generalization to the values{arts _ science ,business)
4.Birth _ place: The attributes has a large number of distinct values , birth _ data
defined as city < province _ or _ status < country
ATTRIBUTE – ORIENTED INDUCTION
5.Birth date: support that hierarchy exists that can generalization birth date to age
& age to age _ range
6.residence:The number of distinct vales for number & street will likely be very
high
7.phone:The attributes contain to many distinct values & therefore be removed in
generalization
8.gpa:support a concept hierarchy exists for gpa that groups values for grade point
average numerical intervals like {3.75-4.0,3.5-75,..}
EFFICIENT IMPLEMENTATION OF ATTRIBUTE –
ORIENTED INDUCTION
Algorithm: attribute _ oriented _ induction mining generalization
characteristics in a relational database given a users data mining request
INPUT: (i)DB a relational data base
(ii)DMQ query a data mining query
(iii)a _ list a list of attributes
(iv)Get(a) a sat of concept hierarchies or generalization operators on
attributes
(v)a _ get _ thresh(a)
OUTPUT & METHODS
Output: p , a prime _ generalization _ relation
Methods : the method is outline as follows
1.W get _ task _relevant _ data (DMQ query , DB)the working relevant hold
the task _ relevant data
2.Prepare _ for _generalization(W)
(a)scan w & collect the distinct values for each attributes
(b)For each attribute ai determine if not computer its minimum desired level L
P GENERALIZATION (W)
 The prime _generalization _ relation P derived by replacing each value v in w
accumulating count and computing any other aggregate value
(a)For each generalization tuple insert the tuple into a sorted prime relation p by
a binary search
(b)since in most cases the number of distinct values at the prime relation level is
small
PRESENTATION OF THE DERIVED
GENERALIZATION
 Attributes – oriented induction generates one or a set of
generalized description
Location item sales count
Asia TV 15 300
Europe TV 12 250
North America TV 28 450
Asia computer 120 1000
A CROSSTAB FOR THE SALES IN 1999
LOCATIONITEM TV COMPUTER BOTH _ ITEM
sales count sales count sales count
ASIA 15 300 120 1000 135 13000
Europe 12 250 150 1200 162 1450
All regions 55 1000 470 4000 525 5000
The t-weight as an interestingness measures
the typicality of each disjunct in the rule
T -WEIGHT
 The t weight for Qa is the percentage of tuple of the
target class from the initial working relation that are
covered by Qa
t _ weight = count (qa )/count(qi)
BAR CHART REPRESENTATION
200
150
100
50
0
TV computer TV + Computer
PIE CHART REPRESENTATION
North Asia(27.7%)
America(50%) TV sales
Europe(21.82%)
Asia(42%) Europe(25%)
north(31%)
computer sales
THANK YOU!!!

More Related Content

What's hot (20)

PPTX
Deadlock dbms
Vardhil Patel
 
PDF
Ddb 1.6-design issues
Esar Qasmi
 
PPT
3.2 partitioning methods
Krish_ver2
 
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
PDF
Triggers and active database
BalaMuruganSamuthira
 
PPTX
Knowledge representation in AI
Vishal Singh
 
PPTX
Multimedia Database
Avnish Patel
 
PPTX
Logics for non monotonic reasoning-ai
ShaishavShah8
 
PPT
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
PPT
Association rule mining
Acad
 
PPTX
Clustering in Data Mining
Archana Swaminathan
 
PPT
Data structures using c
Prof. Dr. K. Adisesha
 
PPT
recursive transition_networks
Rajendran
 
PPTX
Database System Architectures
Information Technology
 
PPTX
multi dimensional data model
moni sindhu
 
DOC
Data structures question paper anna university
sangeethajames07
 
PPTX
03 Data Mining Techniques
Valerii Klymchuk
 
PPTX
Free Space Management, Efficiency & Performance, Recovery and NFS
United International University
 
PPTX
Ensemble learning
Mustafa Sherazi
 
PPTX
Introduction to Data Mining
DataminingTools Inc
 
Deadlock dbms
Vardhil Patel
 
Ddb 1.6-design issues
Esar Qasmi
 
3.2 partitioning methods
Krish_ver2
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Triggers and active database
BalaMuruganSamuthira
 
Knowledge representation in AI
Vishal Singh
 
Multimedia Database
Avnish Patel
 
Logics for non monotonic reasoning-ai
ShaishavShah8
 
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
Association rule mining
Acad
 
Clustering in Data Mining
Archana Swaminathan
 
Data structures using c
Prof. Dr. K. Adisesha
 
recursive transition_networks
Rajendran
 
Database System Architectures
Information Technology
 
multi dimensional data model
moni sindhu
 
Data structures question paper anna university
sangeethajames07
 
03 Data Mining Techniques
Valerii Klymchuk
 
Free Space Management, Efficiency & Performance, Recovery and NFS
United International University
 
Ensemble learning
Mustafa Sherazi
 
Introduction to Data Mining
DataminingTools Inc
 

Similar to data generalization and summarization (20)

PPTX
19CS3052R-CO1-7-S7 ECE
Bharath123Maddipati
 
PDF
Characterization
Aiswaryadevi Jaganmohan
 
PPT
concept desciption.ppt-Basket data.ppt data warehouse-Data Mining
masooda5
 
PPT
Characterization and Comparison
Benjamin Franklin
 
PPT
Classification
Anurag jain
 
PPTX
Data Mining: Data cube computation and data generalization
DataminingTools Inc
 
PPTX
Data Mining: Data cube computation and data generalization
Datamining Tools
 
PPTX
Attribute oriented analysis
Hirra Sultan
 
PDF
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 
PPTX
Analysis Of Attribute Revelance
pradeepa velmurugan
 
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
PPT
Data preprocessing in Data Mining
DHIVYADEVAKI
 
PPTX
Unit3-AssociationRuleMining and data techniques.pptx
yokeshmca
 
PDF
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
DOCX
Data Mining DataLecture Notes for Chapter 2Introduc
OllieShoresna
 
PPT
Data preprocessing
Manikandan Tamilselvan
 
19CS3052R-CO1-7-S7 ECE
Bharath123Maddipati
 
Characterization
Aiswaryadevi Jaganmohan
 
concept desciption.ppt-Basket data.ppt data warehouse-Data Mining
masooda5
 
Characterization and Comparison
Benjamin Franklin
 
Classification
Anurag jain
 
Data Mining: Data cube computation and data generalization
DataminingTools Inc
 
Data Mining: Data cube computation and data generalization
Datamining Tools
 
Attribute oriented analysis
Hirra Sultan
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 
Analysis Of Attribute Revelance
pradeepa velmurugan
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
Data preprocessing in Data Mining
DHIVYADEVAKI
 
Unit3-AssociationRuleMining and data techniques.pptx
yokeshmca
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
Data Mining DataLecture Notes for Chapter 2Introduc
OllieShoresna
 
Data preprocessing
Manikandan Tamilselvan
 
Ad

More from janani thirupathi (17)

PPTX
Networks
janani thirupathi
 
PPTX
Multimedia
janani thirupathi
 
PPTX
Data structure
janani thirupathi
 
PPTX
Software Engineering
janani thirupathi
 
PPTX
Data warehouse architecture
janani thirupathi
 
PPTX
Evolution of os
janani thirupathi
 
PPTX
B tree
janani thirupathi
 
PPTX
File sharing
janani thirupathi
 
PPTX
Data transfer and manipulation
janani thirupathi
 
PPTX
Arithmetic Logic
janani thirupathi
 
PPTX
Transaction management
janani thirupathi
 
PPTX
Programming in c Arrays
janani thirupathi
 
PPTX
Memory System
janani thirupathi
 
PPTX
Cn assignment
janani thirupathi
 
PPTX
Narrowband ISDN
janani thirupathi
 
Multimedia
janani thirupathi
 
Data structure
janani thirupathi
 
Software Engineering
janani thirupathi
 
Data warehouse architecture
janani thirupathi
 
Evolution of os
janani thirupathi
 
File sharing
janani thirupathi
 
Data transfer and manipulation
janani thirupathi
 
Arithmetic Logic
janani thirupathi
 
Transaction management
janani thirupathi
 
Programming in c Arrays
janani thirupathi
 
Memory System
janani thirupathi
 
Cn assignment
janani thirupathi
 
Narrowband ISDN
janani thirupathi
 
Ad

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PPTX
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
PPTX
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PDF
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PPTX
How to Set Maximum Difference Odoo 18 POS
Celine George
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PDF
The Different Types of Non-Experimental Research
Thelma Villaflores
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
How to Set Maximum Difference Odoo 18 POS
Celine George
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
The Different Types of Non-Experimental Research
Thelma Villaflores
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 

data generalization and summarization

  • 1. PRESENTED BY R.RAMADEVI I . M SC (CS & IT) NADAR SARASWATHI COLLEGE OF ARTS & SCIENCE THENI. DATA GENERALIZATION AND SUMMARIZATION-BASED CHARACTERIZED
  • 2. DATA GENERALIZATION AND SUMMARIZATION- BASED CHARACTERIZATION  Data and objects in database often contain detailed information at primitive concept levels FOR EXAMPLE: The item relation in a sales database may contain attributes describing low- level item information such as item _ id ,name ,brand , category ,supplier , place _ made and price  This requries an important functionality in data mining :data generalization
  • 3. DATA GENERALIZATION  DATA GENERALIZATION is a process that abstracts a large set of task relevant data in a database from a relatively low conceptual level to hight conceptual levels  The generalization of large data sets can be categorized according to two approaches (1)The data cube (or OLAP)approach (2)The attribute-oriented induction approach
  • 4. ATTRIBUTE-ORIENTED INDUCTION  The attribute-oriented induction(AOI) approach to data generalization and summarization _ based characterization was first proposed in 1989  The data cube approach can be considered as a data warehouse based precomputation oriented materialized _ view approach  It performs off _ line aggregation before an OLAP or data mining query is submitted for processing  The attribute oriented induction approach , a relation database query oriented , generalization-based ,on-line data analysis technique
  • 5. ATTRIBUTE-ORIENTED INDUCTION  Some aggregation in the data cube can be computed on – line  While off – line precomputation of multidimensional space can speed up attribute – oriented induction as well  To first collect the task _ relevant data using a relational database query and then perform generalization based on the examination of the number of distinct value of each attributer in the relevant set of data
  • 6. EXAMPLE Specifying a data mining query for characterization with DMQL:  Suppose that a user would like to describe the general characteristics of graduate students in the BIG _ UNIVERSITY  The attributes (name ,gender ,major , birth _ place , birth _ data , phone no & gpa use Big _ university _ DB mine characteristics as “science _ students” in relevant to name , gander , major , birth place , birth date , phone no ,gpa from student where status in “ graduate “
  • 7. TRANSFORMING A DATA MINING QUERY TO A RELATIONAL QUERY  The transformed query is executed against the relational data base  Big university DB and return the data show  This table on which induction will be perfomed use Big _ university _ DB select name , gander , major , birth place , birth date , phone no ,gpa from student where status in [ “M.SC”, “ M.A ”,” M.B.A ., ”,” Ph.D”]
  • 8. DATA GENERALIZATION TWO TYPES ATTRIBUTES REMOVED:  If there is a large set of distinct values for an attributes of the initial working relation (1)There is no generalization operator on the attributes (2)Its higher level concept are expressed in terms of other attributes
  • 9. ATTRIBUTES GENERALIZATION  If there is a large set of distinct values for an attributes in the initial working relation and there exists a set of generalization operation on the attributes  This corresponds to the generalization rule known as climbing generalization trees in learning example or concept tree ascension First technique: called attributes generalization threshold control second technique : called generalization relation threshold control
  • 10. ATTRIBUTE – ORIENTED INDUCTION For each attributes of the relation the generalization proceeds as follows: 1.name:the large number of distinct values for gender , no generalization operation defined attributes is removed 2.gender:There are two distinct values , the attributes is retained 3.major:support the concept hierarchy has be defined the attributes major to generalization to the values{arts _ science ,business) 4.Birth _ place: The attributes has a large number of distinct values , birth _ data defined as city < province _ or _ status < country
  • 11. ATTRIBUTE – ORIENTED INDUCTION 5.Birth date: support that hierarchy exists that can generalization birth date to age & age to age _ range 6.residence:The number of distinct vales for number & street will likely be very high 7.phone:The attributes contain to many distinct values & therefore be removed in generalization 8.gpa:support a concept hierarchy exists for gpa that groups values for grade point average numerical intervals like {3.75-4.0,3.5-75,..}
  • 12. EFFICIENT IMPLEMENTATION OF ATTRIBUTE – ORIENTED INDUCTION Algorithm: attribute _ oriented _ induction mining generalization characteristics in a relational database given a users data mining request INPUT: (i)DB a relational data base (ii)DMQ query a data mining query (iii)a _ list a list of attributes (iv)Get(a) a sat of concept hierarchies or generalization operators on attributes (v)a _ get _ thresh(a)
  • 13. OUTPUT & METHODS Output: p , a prime _ generalization _ relation Methods : the method is outline as follows 1.W get _ task _relevant _ data (DMQ query , DB)the working relevant hold the task _ relevant data 2.Prepare _ for _generalization(W) (a)scan w & collect the distinct values for each attributes (b)For each attribute ai determine if not computer its minimum desired level L
  • 14. P GENERALIZATION (W)  The prime _generalization _ relation P derived by replacing each value v in w accumulating count and computing any other aggregate value (a)For each generalization tuple insert the tuple into a sorted prime relation p by a binary search (b)since in most cases the number of distinct values at the prime relation level is small
  • 15. PRESENTATION OF THE DERIVED GENERALIZATION  Attributes – oriented induction generates one or a set of generalized description Location item sales count Asia TV 15 300 Europe TV 12 250 North America TV 28 450 Asia computer 120 1000
  • 16. A CROSSTAB FOR THE SALES IN 1999 LOCATIONITEM TV COMPUTER BOTH _ ITEM sales count sales count sales count ASIA 15 300 120 1000 135 13000 Europe 12 250 150 1200 162 1450 All regions 55 1000 470 4000 525 5000 The t-weight as an interestingness measures the typicality of each disjunct in the rule
  • 17. T -WEIGHT  The t weight for Qa is the percentage of tuple of the target class from the initial working relation that are covered by Qa t _ weight = count (qa )/count(qi)
  • 19. PIE CHART REPRESENTATION North Asia(27.7%) America(50%) TV sales Europe(21.82%) Asia(42%) Europe(25%) north(31%) computer sales