Rough Sets in KDD
Tutorial Notes
Andrzej Skowron
Warsaw University
Ning Zhong
Maebashi Institute of Technolgy
Copyright 2000 by A. Skowron & N. Zhong
About the Speakers
 Andrzej Skowron received his Ph.D. from Warsaw University.
He is a professor in Faculty of Mathematics, Computer Science and
Mechanics, Warsaw University, Poland. His research interests
include soft computing methods and applications, in particular,
reasoning with incomplete information, approximate reasoning,
rough sets, rough mereology, granular computing, synthesis and
analysis of complex objects, intelligent agents, knowledge discovery
and data mining, etc, with over 200 journal and conference
publications. He is an editor of several international journals and
book series including Fundamenta Informaticae (editor in chief),
Data Mining and Knowledge Discovery. He is president of
International Rough Set Society. He was an invited speaker at many
international conferences, and has served or is currently serving on
the program committees of over 40 international conferences and
workshops, including ISMIS’97-99 (program chair), RSCTC’98-00
(program chair), RSFDGrC’99 (program chair).
About the Speakers (2)
 Ning Zhong received his Ph.D. from the University of Tokyo.
He is director of Knowledge Information Systems Laboratory, and
an associate professor in Department of Information Engineering,
Maebashi Institute of Technology, Japan. His research interests
include knowledge discovery and data mining, rough sets and
granular-soft computing, intelligent agents and databases,
knowledge-based systems and hybrid systems, with over 80 journal
and conference publications. He is an editor of Knowledge and
Information Systems: an international journal (Springer). He is a
member of the advisory board of International Rough Set Society,
ACM SIGKDD International Liaisons Board, the Steering
Committee of PAKDD conferences, the advisory board and
coordinator of BISC/SIGGrC. He has served or is currently serving
on the program committees of over 25 international conferences and
workshops, including PAKDD’99 (program chair), IAT’99 (program
chair), and RSFDGrC’99 (program chair).
Contents
 Introduction
 Basic Concepts of Rough Sets
 A Rough Set Based KDD process
 Rough Sets in ILP and GrC
 Concluding Remarks
(Summary, Advanced Topics, References
and Further Readings).
Introduction
 Rough set theory was developed by Zdzislaw
Pawlak in the early 1980’s.
 Representative Publications:
– Z. Pawlak, “Rough Sets”, International Journal
of Computer and Information Sciences, Vol.11,
341-356 (1982).
– Z. Pawlak, Rough Sets - Theoretical Aspect of
Reasoning about Data, Kluwer Academic
Pubilishers (1991).
Introduction (2)
 The main goal of the rough set analysis is
induction of approximations of concepts.
 Rough sets constitutes a sound basis for
KDD. It offers mathematical tools to
discover patterns hidden in data.
 It can be used for feature selection, feature
extraction, data reduction, decision rule
generation, and pattern extraction
(templates, association rules) etc.
Introduction (3)
 Recent extensions of rough set theory have
developed new methods for decomposition of
large data sets, data mining in distributed and
multi-agent systems, and granular computing.
This presentation shows how several aspects of
the above problems are solved by the (classic)
rough set approach, discusses some advanced
topics, and gives further research directions.
Basic Concepts of Rough Sets
 Information/Decision Systems (Tables)
 Indiscernibility
 Set Approximation
 Reducts and Core
 Rough Membership
 Dependency of Attributes
Information Systems/Tables
 IS is a pair (U, A)
 U is a non-empty
finite set of objects.
 A is a non-empty finite
set of attributes such
that for
every
 is called the value
set of a.
a
V
U
a 
:
.
A
a
a
V
Age LEMS
x1 16-30 50
x2 16-30 0
x3 31-45 1-25
x4 31-45 1-25
x5 46-60 26-49
x6 16-30 26-49
x7 46-60 26-49
Decision Systems/Tables
 DS:
 is the decision
attribute.
 The elements of A are
called the condition
attributes.
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
})
{
,
( d
A
U
T 

A
d 
Issues in the Decision Table
 The same or indiscernible objects may be
represented several times.
 Some of the attributes may be superfluous.
Indiscernibility
 The equivalence relation
A binary relation which is reflexive
(i.e. an object is in relation with itself xRx) ,
symmetric (if xRy then yRx) and
transitive (if xRy and yRz then xRz).
 The equivalence class of an element
consists of all objects such that xRy.
X
X
R 

X
x
X
y
Indiscernibility (2)
 Let IS = (U, A) be an information system, then
with any there is associated an equivalence
relation:
where is called the B-indiscernibility
relation.
 If then objects x and x’are
indiscernible from each other by attributes from B.
 The equivalence classes of the B-indiscernibility
relation are denoted
A
B 
)}
'
(
)
(
,
|
)
'
,
{(
)
( 2
x
a
x
a
B
a
U
x
x
B
INDIS 




)
(B
INDIS
),
(
)
'
,
( B
IND
x
x IS

.
]
[ B
x
An Example of Indiscernibility
 The non-empty subsets of
the condition attributes
are {Age}, {LEMS}, and
{Age, LEMS}.
 IND({Age}) = {{x1,x2,x6},
{x3,x4}, {x5,x7}}
 IND({LEMS}) = {{x1},
{x2}, {x3,x4}, {x5,x6,x7}}
 IND({Age,LEMS}) =
{{x1}, {x2}, {x3,x4},
{x5,x7}, {x6}}.
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
Observations
 An equivalence relation induces a partitioning
of the universe.
 The partitions can be used to build new subsets
of the universe.
 Subsets that are most often of interest have the
same value of the decision attribute.
It may happen, however, that a concept such as
“Walk” cannot be defined in a crisp manner.
Set Approximation
 Let T = (U, A) and let and
We can approximate X using only the
information contained in B by constructing
the B-lower and B-upper approximations of
X, denoted and respectively, where
A
B  .
U
X 
X
B X
B
},
]
[
|
{ X
x
x
X
B B 

}.
0
]
[
|
{ 

 X
x
x
X
B B
Set Approximation (2)
 B-boundary region of X,
consists of those objects that we cannot
decisively classify into X in B.
 B-outside region of X,
consists of those objects that can be with
certainty classified as not belonging to X.
 A set is said to be rough if the boundary
region is non-empty.
,
)
( X
B
X
B
X
BNB 

,
X
B
U 
An Example of Set Approximation
 Let W = {x | Walk(x) = yes}.
 The decision class, Walk, is
rough since the boundary
region is not empty.
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
}.
7
,
5
,
2
{
},
4
,
3
{
)
(
},
6
,
4
,
3
,
1
{
},
6
,
1
{
x
x
x
W
A
U
x
x
W
BN
x
x
x
x
W
A
x
x
W
A
A





An Example of
Set Approximation (2)
yes
yes/no
no
{{x1,{x6}}
{{x3,x4}}
{{x2}, {x5,x7}}
U
setX
U/R
R : subset of
attributes
X
R
X
R
Lower & Upper Approximations
Lower & Upper Approximations
(2)
}
:
/
{ X
Y
R
U
Y
X
R 

 
}
0
:
/
{ 


 X
Y
R
U
Y
X
R 
Lower Approximation:
Upper Approximation:
Lower & Upper Approximations
(3)
X1 = Flu(yes) = {u2, u3, u6, u7}
Lower approx., RX1
{u2, u3}
Upper approx.,
{u2, u3, u6, u7, u8, u5}
X2 = Flu(no) = {u1, u4, u5, u8}
Lower approx., RX2
{u1, u4}
Upper approx.,
{u1, u4, u5, u8, u7, u6}
X1
R X2
R
U Headache Temp. Flu
U1 Yes Normal No
U2 Yes High Yes
U3 Yes Very-high Yes
U4 No Normal No
U5 N
N
No
o
o H
H
Hi
i
ig
g
gh
h
h N
N
No
o
o
U6 No Very-high Yes
U7 N
N
No
o
o H
H
Hi
i
ig
g
gh
h
h Y
Y
Ye
e
es
s
s
U8 No Very-high No
Elementary sets of indiscernibility relations
defined by R = {Headache, Temp.} are {u1},
{u2}, {u3}, {u4}, {u5, u7}, {u6, u8}.
Lower & Upper Approximations
(4)
R = {Headache, Temp.}
U/R = { {u1}, {u2}, {u3}, {u4}, {u5, u7}, {u6, u8}}
X1 = Flu(yes) = {u2,u3,u6,u7}
X2 = Flu(no) = {u1,u4,u5,u8}
RX1 = {u2, u3}
= {u2, u3, u6, u7, u8, u5}
RX2 = {u1, u4}
= {u1, u4, u5, u8, u7, u6}
X1
R
X2
R
u1
u4
u3
X1 X2
u5
u7
u2
u6 u8
Properties of Approximation
Y
X
Y
B
X
B
Y
X
B
Y
B
X
B
Y
X
B
U
U
B
U
B
B
B
X
B
X
X
B












)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
,
)
(
)
(
)
(




)
(
)
( Y
B
X
B  )
(
)
( Y
B
X
B 
implies and
Properties of Approximation (2)
)
(
))
(
(
))
(
(
)
(
))
(
(
))
(
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
X
B
X
B
B
X
B
B
X
B
X
B
B
X
B
B
X
B
X
B
X
B
X
B
Y
B
X
B
Y
X
B
Y
B
X
B
Y
X
B
















where -X denotes U - X.
Four Basic Classes of Rough Sets
 X is roughly B-definable, iff and
 X is internally B-undefinable, iff
and
 X is externally B-undefinable, iff
and
 X is totally B-undefinable, iff
and
0
)
( 
X
B
,
)
( U
X
B 
0
)
( 
X
B
,
)
( U
X
B 
0
)
( 
X
B
,
)
( U
X
B 
0
)
( 
X
B
.
)
( U
X
B 
Accuracy of Approximation
where |X| denotes the cardinality of
Obviously
If X is crisp with respect to B.
If X is rough with respect to B.
|
)
(
|
|
)
(
|
)
(
X
B
X
B
X
B 

.
0

X
.
1
0 
 B

,
1
)
( 
X
B

,
1
)
( 
X
B

Issues in the Decision Table
 The same or indiscernible objects may be
represented several times.
 Some of the attributes may be superfluous
(redundant).
That is, their removal cannot worsen the
classification.
Reducts
 Keep only those attributes that preserve the
indiscernibility relation and, consequently,
set approximation.
 There are usually several such subsets of
attributes and those which are minimal are
called reducts.
Dispensable & Indispensable
Attributes
Let
Attribute c is dispensable in T,
if , otherwise
attribute c is indispensable in T.
.
C
c
)
(
)
( })
{
( D
POS
D
POS c
C
C 

X
C
D
POS
D
U
X
C /
)
(


The positive region:
Independent
 T = (U, A, C, D) is independent
if all are indispensable in T.
C
c
Reduct & Core
 The set of attributes is called a reduct
of C, if T’= (U, A, R, D) is independent and
 The set of all the condition attributes
indispensable in T is denoted by CORE(C).
where RED(C) is the set of all reducts of C.
C
R 
).
(
)
( D
POS
D
POS C
R 
)
(
)
( C
RED
C
CORE 

An Example of Reducts & Core
U Headache Muscle
pain
Temp. Flu
U1 Yes Yes Normal No
U2 Yes Yes High Yes
U3 Yes Yes Very-high Yes
U4 No Yes Normal No
U5 No No High No
U6 No Yes Very-high Yes
U Muscle
pain
Temp. Flu
U1,U4 Yes Normal No
U2 Yes High Yes
U3,U6 Yes Very-high Yes
U5 No High No
U Headache Temp. Flu
U1 Yes Norlmal No
U2 Yes High Yes
U3 Yes Very-high Yes
U4 No Normal No
U5 No High No
Reduct1 = {Muscle-pain,Temp.}
Reduct2 = {Headache, Temp.}
CORE = {Headache,Temp} {MusclePain, Temp} = {Temp}

Discernibility Matrix
 Let T = (U, A, C, D) be a decision
table, with
By a discernibility matrix of T, denoted M(T),
we will mean matrix defined as:
for i, j = 1,2,…,n.
Here denotes that this case does not need to
be considered.
They classify objects and into different
classes.
}.
,...,
,
{ 2
1 n
u
u
u
U 
n
n
 )]
(
)
(
[
)}
(
)
(
:
{
)]
(
)
(
[
j
i
j
i
j
i
u
d
u
d
D
d
if
u
c
u
c
C
c
u
d
u
d
D
d
if
ij
m








 
i
u j
u
,
}
{ A
a
a
V 

Discernibility Function
 For any ,
U
ui 
}}
,...,
2
,
1
{
,
:
{
)
( n
j
i
j
m
u
f ij
j
i
T 




where (1) is the disjunction of all variables a
such that if
(2) if
(3) if
ij
m

,
ij
m
c .


ij
m
),
( false
mij 

 .


ij
m
),
(true
t
mij 
 .


ij
m
Each logical product in the minimal disjunctive normal
form defines a reduct of instance .
i
u
Examples of Discernibility Matrix
No a b c d
u1 a0 b1 c1 y
u2 a1 b1 c0 n
u3 a0 b2 c1 n
u4 a1 b1 c1 y
C = {a, b, c}
D = {d}
In order to discern equivalence
classes of the decision attribute d,
to preserve conditions described
by the discernibility matrix for
this table
u1 u2 u3
u2
u3
u4
a,c
b
c a,b
Reduct = {b, c}
)
(
)
( b
a
c
b
c
a 



 

Examples of Discernibility Matrix
(2)
a b c d E
u1 1 0 2 1 1
u2 1 0 2 0 1
u3 1 2 0 0 2
u4 1 2 2 1 0
u5 2 1 0 0 2
u6 2 1 1 0 2
u7 2 1 2 1 1
u1 u2 u3 u4 u5 u6
u2
u3
u4
u5
u6
u7
b,c,d b,c
b b,d c,d
a,b,c,d a,b,c a,b,c,d
a,b,c,d a,b,c a,b,c,d
a,b,c,d a,b c,d c,d
Core = {b}
Reduct1 = {b,c}
Reduct2 = {b,d}


 
 
Rough Membership
 The rough membership function quantifies
the degree of relative overlap between the
set X and the equivalence class to
which x belongs.
 The rough membership function can be
interpreted as a frequency-based estimate of
where u is the equivalence
relation of IND(B).
B
x]
[
]
1
,
0
[
: 
U
B
X

|
]
[
|
|
]
[
|
B
B
B
X
x
X
x 


),
|
( u
X
x
P 
Rough Membership (2)
 The formulae for the lower and upper
approximations can be generalized to some
arbitrary level of precision by means
of the rough membership function
 Note: the lower and upper approximations as
originally formulated are obtained as a special
case with
]
1
,
5
.
0
(


}.
1
)
(
|
{
}
)
(
|
{











x
x
X
B
x
x
X
B
B
X
B
X
.
1


Dependency of Attributes
 Discovering dependencies between attributes
is an important issue in KDD.
 A set of attribute D depends totally on a set
of attributes C, denoted if all values
of attributes from D are uniquely determined
by values of attributes from C.
,
D
C 
Dependency of Attributes (2)
 Let D and C be subsets of A. We will say that
D depends on C in a degree k
denoted if
where called a positive
region of the partition U/D with respect to C.
),
1
0
( 
 k
,
D
C k

|
|
|
)
(
|
)
,
(
U
D
POS
D
C
k C

 
),
(
)
(
/
X
C
D
POS
D
U
X
C

 
Dependency of Attributes (3)
 Obviously
 If k = 1 we say that D depends totally on C.
 If k < 1 we say that D depends partially
(in a degree k) on C.
.
|
|
|
)
(
|
)
,
(
/



D
U
X U
X
C
D
C

A Rough Set Based KDD Process
 Discretization based on RS and
Boolean Reasoning (RSBR).
 Attribute selection based RS
with Heuristics (RSH).
 Rule discovery by GDT-RS.
What Are Real World Issues ?
 Very large data sets
 Uncertainty (noisy data)
 Incompleteness (missing, incomplete data)
 Data change
 Use of background knowledge
very large data set
noisy data
incomplete instances
data change
use of background
knowledge
Real world
issues
Methods
ID3 Prism Version BP Dblearn
(C4.5) Space
Okay possible
Probability
Logic
Set
Soft Techniques for KDD
Stoch. Proc.
Belief Nets
Conn. Nets
GDT
Deduction
Induction
Abduction
RoughSets
Fuzzy Sets
Soft Techniques for KDD (2)
Deduction
Induction
Abduction
GDT
GrC
RS&ILP
RS
TM
A Hybrid Model
GDT : Generalization Distribution Table
RS : Rough Sets
TM: Transition Matrix
ILP : Inductive Logic Programming
GrC : Granular Computing
A Rough Set Based KDD Process
 Discretization based on RS and
Boolean Reasoning (RSBR).
 Attribute selection based RS
with Heuristics (RSH).
 Rule discovery by GDT-RS.
Discretization based on RSBR
 In the discretization of a decision table =
where is an
interval of real-valued values, we search for
a partition of for any
 Any partition of is defined by a sequence
of the so-called cuts from
 Any family of partitions can be
identified with a set of cuts.
}),
{
,
( d
A
U  )
,
[ a
a
a w
v
V 
a
P a
V .
A
a
a
V
1
v
k
v
v
v 

 ...
2
1 .
a
V
A
a
a
P 
}
{
Discretization Based on RSBR
(2)
In the discretization process, we search for a set
of cuts satisfying some natural conditions.
A a b d
u1 0.8 2 1
u2 1 0.5 0
u3 1.3 3 0
u4 1.4 1 1
u5 1.4 2 0
u6 1.6 3 1
u7 1.3 1 1
A a b d
u1 0 2 1
u2 1 0 0
u3 1 2 0
u4 1 1 1
u5 1 2 0
u6 2 2 1
u7 1 1 1
P P
P = {(a, 0.9),
(a, 1.5),
(b, 0.75),
(b, 1.5)}
A Geometrical Representation of
Data and Cuts
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4
x7
x5
x6
A Geometrical Representation of
Data and Cuts (2)
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4
x5
x6
x7
Discretization Based on RSBR
(3)
 The sets of possible values of a and b are
defined by
 The sets of values of a and b on objects
from U are given by
a(U) = {0.8, 1, 1.3, 1.4, 1.6};
b(U) = {0.5, 1, 2, 3}.
);
2
,
0
[

a
V ).
4
,
0
[

b
V
Discretization Based on RSBR
(4)
 The discretization process returns a partition
of the value sets of conditional attributes
into intervals.
A Discretization Process
 Step 1: define a set of Boolean variables,
where
corresponds to the interval [0.8, 1) of a
corresponds to the interval [1, 1.3) of a
corresponds to the interval [1.3, 1.4) of a
corresponds to the interval [1.4, 1.6) of a
corresponds to the interval [0.5, 1) of b
corresponds to the interval [1, 2) of b
corresponds to the interval [2, 3) of b
}
,
,
,
,
,
,
{
)
( 3
2
1
4
3
2
1
b
b
b
a
a
a
a
p
p
p
p
p
p
p
U
BV 
b
b
b
a
a
a
a
p
p
p
p
p
p
p
3
2
1
4
3
2
1
The Set of Cuts on Attribute a
0.8 1.0 1.3 1.4 1.6
a
a
p1
a
p2
a
p3
a
p4
1
c 2
c 3
c 4
c
A Discretization Process (2)
 Step 2: create a new decision table by using
the set of Boolean variables defined in Step 1.
Let be a decision table,
be a propositional variable corresponding to
the interval for any
and
})
{
,
( d
A
U
T 

a
k
p
)
,
[ 1
a
k
a
k v
v 
}
1
,...,
1
{ 
 a
n
k
.
A
a
A Sample T Defined in Step 2
U*
a
p1
a
p3
a
p2
a
p4
b
p1
b
p2
b
p3
(x1,x2)
(x1,x3)
(x1,x5)
(x4,x2)
(x4,x3)
(x4,x5)
(x6,x2)
(x6,x3)
(x6,x5)
(x7,x2)
(x7,x3)
(x7,x5)
1 0 0 0 1 1 0
1 1 0 0 0 0 1
1 1 1 0 0 0 0
0 1 1 0 1 0 0
0 0 1 0 0 1 1
0 0 0 0 0 1 0
0 1 1 1 1 1 1
0 0 1 1 0 0 0
0 0 0 1 0 0 1
0 1 0 0 1 0 0
0 0 0 0 0 1 0
0 0 1 0 0 1 0
The Discernibility Formula
 The discernibility formula
means that in order to discern object x1 and
x2, at least one of the following cuts must
be set,
a cut between a(0.8) and a(1)
a cut between b(0.5) and b(1)
a cut between b(1) and b(2).
b
b
a
p
p
p
x
x 2
1
1
2
1 )
,
( 



The Discernibility Formulae for
All Different Pairs
b
b
a
p
p
p
x
x 2
1
1
2
1 )
,
( 



b
a
a
p
p
p
x
x 3
1
1
3
1 )
,
( 



a
a
a
p
p
p
x
x 3
2
1
5
1 )
,
( 



b
a
a
p
p
p
x
x 1
3
2
2
4 )
,
( 



b
b
a
p
p
p
x
x 3
2
2
3
4 )
,
( 



b
p
x
x 2
5
4 )
,
( 

The Discernibility Formulae for
All Different Pairs (2)
b
b
b
a
a
a
p
p
p
p
p
p
x
x 3
2
1
4
3
2
2
6 )
,
( 






a
a
p
p
x
x 4
3
3
6 )
,
( 


b
a
p
p
x
x 3
4
5
6 )
,
( 


b
a
p
p
x
x 1
2
2
7 )
,
( 


b
b
p
p
x
x 3
2
3
7 )
,
( 


b
a
p
p
x
x 2
3
5
7 )
,
( 


A Discretization Process (3)
 Step 3: find the minimal subset of p that
discerns all objects in different decision
classes.
The discernibility boolean propositional
formula is defined as follows,
)}.
(
)
(
:
)
.
(
{ j
i
U
x
d
x
d
j
i 

 

The Discernibility Formula
in CNF Form
)
(
)
( 3
2
1
2
1
1
b
a
a
b
b
a
U
p
p
p
p
p
p 






)
(
)
( 3
2
2
1
3
2
b
b
a
b
a
a
p
p
p
p
p
p 





)
( 3
2
1
4
3
2
b
b
b
a
a
a
p
p
p
p
p
p 





)
(
)
(
)
( 1
2
3
4
4
3
b
a
b
a
a
a
p
p
p
p
p
p 





.
)
(
)
( 2
2
3
3
2
b
b
a
b
b
p
p
p
p
p 




The Discernibility Formula
in DNF Form
 We obtain four prime implicants,
is the optimal result, because
it is the minimal subset of P.
}
,
,
{ 2
4
2
b
a
a
p
p
p
)
(
)
( 3
2
3
2
2
4
2
b
b
a
a
b
a
a
U
p
p
p
p
p
p
p 







).
(
)
( 2
1
4
1
3
2
1
3
b
b
a
a
b
b
b
a
p
p
p
p
p
p
p
p 







The Minimal Set Cuts
for the Sample DB
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4
x5
x6
x7
A Result
A a b d
u1 0.8 2 1
u2 1 0.5 0
u3 1.3 3 0
u4 1.4 1 1
u5 1.4 2 0
u6 1.6 3 1
u7 1.3 1 1
A a b d
u1 0 1 1
u2 0 0 0
u3 1 1 0
u4 1 0 1
u5 1 1 0
u6 2 1 1
u7 1 0 1
P P
P = {(a, 1.2),
(a, 1.5),
(b, 1.5)}
A Rough Set Based KDD Process
 Discretization based on RS and
Boolean Reasoning (RSBR).
 Attribute selection based RS
with Heuristics (RSH).
 Rule discovery by GDT-RS.
Attribute Selection
U Headache Muscle-pain Temp. Flu
U1 Yes Yes Normal No
U2 Yes Yes High Yes
U3 Yes Yes Very-high Yes
U4 No Yes Normal No
U5 No No High No
U6 No Yes Very-high Yes
U Muscle-pain Temp. Flu
U1 Yes Normal No
U2 Yes High Yes
U3 Yes Very-high Yes
U4 Yes Normal No
U5 No High No
U6 Yes Very-high Yes
U Headache Temp. Flu
U1 Yes Normal No
U2 Yes High Yes
U3 Yes Very-high Yes
U4 No Normal No
U5 No High No
U6 No Very-high Yes
Observations
 A database always contains a lot of attributes
that are redundant and not necessary for rule
discovery.
 If these redundant attributes are not removed,
not only the time complexity of rule discovery
increases, but also the quality of the discovered
rules may be significantly depleted.
The Goal of Attribute Selection
Finding an optimal subset of attributes in a
database according to some criterion, so that
a classifier with the highest possible
accuracy can be induced by learning
algorithm using information about data
available only from the subset of attributes.
The Filter Approach
 Preprocessing
 The main strategies of attribute selection:
– The minimal subset of attributes
– Selection of the attributes with a higher rank
 Advantage
– Fast
 Disadvantage
– Ignoring the performance effects of the induction
algorithm
The Wrapper Approach
 Using the induction algorithm as a part of the search
evaluation function
 Possible attribute subsets (N-number of attributes)
 The main search methods:
– Exhaustive/Complete search
– Heuristic search
– Non-deterministic search
 Advantage
– Taking into account the performance of the induction algorithm
 Disadvantage
– The time complexity is high
1
2 
N
Basic Ideas:
Attribute Selection using RSH
 Take the attributes in CORE as the initial
subset.
 Select one attribute each time using the rule
evaluation criterion in our rule discovery
system, GDT-RS.
 Stop when the subset of selected attributes
is a reduct.
Why Heuristics ?
 The number of possible reducts can be
where N is the number of attributes.
Selecting the optimal reduct from all of
possible reducts is NP-hard and heuristics
must be used.
1
2 
N
The Rule Selection Criteria
in GDT-RS
 Selecting the rules that cover as many
instances as possible.
 Selecting the rules that contain as little
attributes as possible, if they cover the same
number of instances.
 Selecting the rules with larger strengths, if
they have same number of condition
attributes and cover the same number of
instances.
Attribute Evaluation Criteria
 Selecting the attributes that cause the number
of consistent instances to increase faster
– To obtain the subset of attributes as small as
possible
 Selecting an attribute that has smaller number
of different values
– To guarantee that the number of instances covered
by rules is as large as possible.
A Heuristic Algorithm
for Attribute Selection
 Let R be a set of the selected attributes, P be the
set of unselected condition attributes, U be the
set of all instances, X be the set of contradictory
instances, and EXPECT be the threshold of
accuracy.
 In the initial state, R = CORE(C),
k = 0.
)
(D
POS
U
X R


),
(C
CORE
C
P 

A Heuristic Algorithm
for Attribute Selection (2)
 Step 1. If k >= EXPECT, finish, otherwise
calculate the dependency degree, k,
 Step 2. For each p in P, calculate
))
}
{
/(
)
(
(
max_
|
)
(
|
})
{
(
})
{
(
D
p
R
D
POS
size
m
D
POS
v
p
R
p
p
R
p






.
|
|
|
)
(
|
U
D
POS
k R

where max_size denotes the cardinality of the maximal subset.
A Heuristic Algorithm
for Attribute Selection (3)
 Step 3. Choose the best attribute p with the
largest and let
 Step 4. Remove all consistent instances u in
from X.
 Step 5. Go back to Step 1.
)
(D
POSR
}.
{
}
{
p
P
P
p
R
R




,
p
p m
v 
Main Features of RSH
 It can select a better subset of attributes
quickly and effectively from a large DB.
 The selected attributes do not damage the
performance of induction so much.
An Example of
Attribute Selection
U a b c d e
u1 1 0 2 1 1
u2 1 0 2 0 1
u3 1 2 0 0 2
u4 1 2 2 1 0
u5 2 1 0 0 2
u6 2 1 1 0 2
u7 2 1 2 1 1
Condition Attributes:
a: Va = {1, 2}
b: Vb = {0, 1, 2}
c: Vc = {0, 1, 2}
d: Vd = {0, 1}
Decision Attribute:
e: Ve = {0, 1, 2}
U b c d e
u1 0 2 1 1
u2 0 2 0 1
u3 2 0 0 2
u4 2 2 1 0
u5 1 0 0 2
u6 1 1 0 2
u7 1 2 1 1
Searching for CORE
Removing attribute a
Removing attribute a does
not cause inconsistency.
Hence, a is not used as
CORE.
Searching for CORE (2)
Removing attribute b
U a c d e
u1 1 2 1 1
u2 1 2 0 1
u3 1 0 0 2
u4 1 2 1 0
u5 2 0 0 2
u6 2 1 0 2
u7 2 2 1 1
0
1
2
1
4
1
1
2
1
1
:
:
e
d
c
a
u
e
d
c
a
u


Removing attribute b
cause inconsistency.
Hence, b is used as CORE.
Searching for CORE (3)
Removing attribute c
U a b d e
u1 1 0 1 1
u2 1 0 0 1
u3 1 2 0 2
u4 1 2 1 0
u5 2 1 0 2
u6 2 1 0 2
u7 2 1 1 1
Removing attribute c
does not cause inconsistency.
Hence, c is not used
as CORE.
Searching for CORE (4)
Removing attribute d
U a b c e
u1 1 0 2 1
u2 1 0 2 1
u3 1 2 0 2
u4 1 2 2 0
u5 2 1 0 2
u6 2 1 1 2
u7 2 1 2 1
Removing attribute d
does not cause inconsistency.
Hence, d is not used
as CORE.
Searching for CORE (5)
CORE(C)={b}
Initial subset R = {b}
Attribute b is the unique indispensable
attribute.
R={b}
U a b c d e
u1 1 0 2 1 1
u2 1 0 2 0 1
u3 1 2 0 0 2
u4 1 2 2 1 0
u5 2 1 0 0 2
u6 2 1 1 0 2
u7 2 1 2 1 1
1
0 e
b 

U’ b e
u1 0 1
u2 0 1
u3 2 2
u4 2 0
u5 1 2
u6 1 2
u7 1 1
The instances containing b0 will not be considered.
Attribute Evaluation Criteria
 Selecting the attributes that cause the number
of consistent instances to increase faster
– To obtain the subset of attributes as small as
possible
 Selecting the attribute that has smaller number
of different values
– To guarantee that the number of instances covered
by a rule is as lager as possible.
Selecting Attribute from {a,c,d}
U’ a b e
u3 1 2 2
u4 1 2 0
u5 2 1 2
u6 2 1 2
u7 2 1 1
1. Selecting {a}
R = {a,b}
0
2
1
2
2
1
e
b
a
e
b
a


1
1
2
2
1
2
e
b
a
e
b
a





 }
/{
}
,
{ )
(
e
U
X
b
a X
POS
u3,u5,u6
u4
u7
U/{e}
u3
u4
u7
U/{a,b}
u5
u6
Selecting Attribute from {a,c,d} (2)
2. Selecting {c}
R = {b,c}
1
2
1
2
1
1
2
0
1
0
2
2
2
0
2
e
c
b
e
c
b
e
c
b
e
c
b
e
c
b





U’ b c e
u3 2 0 2
u4 2 2 0
u5 1 0 2
u6 1 1 2
u7 1 2 1
u3,u5,u6
u4
u7
U/{e}
U’ b c e
u3 2 0 2
u4 2 2 0
u5 1 0 2
u6 1 1 2
u7 1 2 1
};
7
,
6
,
5
,
4
,
3
{
)
(
}
/{
}
,
{ u
u
u
u
u
X
POS
e
U
X
c
b 


Selecting Attribute from {a,c,d} (3)
3. Selecting {d}
R = {b,d}
1
1
1
2
0
1
0
1
2
2
0
2
e
d
b
e
d
b
e
d
b
e
d
b




U’ b d e
u3 2 0 2
u4 2 1 0
u5 1 0 2
u6 1 0 2
u7 1 1 1
u3,u5,u6
u4
u7
U/{e}
};
7
,
6
,
5
,
4
,
3
{
)
(
}
/{
}
,
{ u
u
u
u
u
X
POS
e
U
X
d
b 


Selecting Attribute from {a,c,d} (4)
3. Selecting {d}
R = {b,d}
}}
6
,
5
{
},
3
{{
}
,
/{
})
6
,
5
,
3
({
}
,
{ u
u
u
d
b
u
u
u
POS d
b 
Result: Subset of attributes= {b, d}
u3,u5,u6
u4
u7
U/{e}
u3,
u4
u7
U/{b,d}
u5,u6
2
})
,
/{
})
6
,
5
,
3
({
(
max_ }
,
{ 
d
b
u
u
u
POS
size d
b
Experimental Results
Data sets Attribute
Number
Instance
Number
Attri. N.
In Core
Selected
Attri. N.
Monk1 6 124 3 3
Monk3 6 122 4 4
Mushroom 22 8124 0 4
Breast
cancer
10 699 1 4
Earthquake 16 155 0 3
Meningitis 30 140 1 4
Bacterial
examination
57 20920 2 9
Slope-
collapse
23 3436 6 8
Gastric
cancer
38 7520 2 19
A Rough Set Based KDD Process
 Discretization based on RS and
Boolean Reasoning (RSBR).
 Attribute selection based RS
with Heuristics (RSH).
 Rule discovery by GDT-RS.
Main Features of GDT-RS
 Unseen instances are considered in the
discovery process, and the uncertainty of a
rule, including its ability to predict possible
instances, can be explicitly represented in the
strength of the rule.
 Biases can be flexibly selected for search
control, and background knowledge can be
used as a bias to control the creation of a GDT
and the discovery process.
A Sample DB
u1 a0 b0 c1 y
u2 a0 b1 c1 y
u3 a0 b0 c1 y
u4 a1 b1 c0 n
u5 a0 b0 c1 n
u6 a0 b2 c1 y
u7 a1 b1 c1 y
Condition attributes: a, b, c
a = {a0, a1} b = {b0, b1, b2} c = {c0, c1}
Decision attribute: d, d = {y,n}
U a b c d
A Sample Database (2)
 T = (U, A, C, D)
 Attributes A = {C, D} = {a, b, c, d}
 Condition Attributes C = {a, b, c}
a: Va = {a0, a1}
b: Vb = {b0, b1, b2}
c: Vc = {c0, c1}
 Decision Attribute D = {d}
d: Vd = {y,n}
,
}
{ A
a
a
V 
A Sample GDT
a0b0c0 a0b0c1 … … a1b0c0 …... a1b2c1
*b0c0
*b0c1
*b1c0
*b1c1
*b2c0
*b2c1
a0*c0
…...
a1b1*
a1b2*
**c0
…...
a0**
a1**
1/2 …… 1/2 ……
1/2 ……
……
……
……
…… 1/2
1/3 ……
…… ……
……
…… 1/2
1/6 1/6 ……
…… ……
1/6 1/6 ……
1/6 …… 1/6
G(x)
F(x)
Explanation for GDT
 F(x): the possible instances (PI)
 G(x): the possible generalizations (PG)
 the probability relationships
between PI & PG.
:
)
(
)
( x
F
x
G 
Probabilistic Relationship
Between PIs and PGs
 


*}
]
[
|
{ l
PG
l
k
k
PG n
N i
a0*c0
a0b0c0
a0b1c0
a0b2c0
3
}
0
0 
 b
c
a n
N
{







otherwise
0
if
1
)
|
(
i
j
PG
i
j
PG
PI
N
PG
PI
p i
P=1/3
1/3
1/3
i
PG
N is the number of PI
satisfying the ith PG.
Unseen Instances
U Headache Muscle-pain Temp. Flu
U1 Yes Yes Normal No
U2 Yes Yes High Yes
U3 Yes Yes Very-high Yes
U4 No Yes Normal No
U5 No No High No
U6 No Yes Very-high Yes
Unseen Instances:
yes,no,normal
yes, no, high
yes, no, very-high
no, yes, high
no, no, normal
no, no, very-high
Open world Closed world
Rule Representation
X Y with S
 X denotes the conjunction of the conditions
that a concept must satisfy
 Y denotes a concept that the rule describes
 S is a “measure of strength” of which the
rule holds
Rule Strength (1)
 The strength of the generalization X
(BK is no used),
is the number of the observed
instances satisfying the ith generalization.
k
PG
k
rel
ins
N
PG
N
l
k
l
k
PG
PI
p
PG
s
X
s
)
(
)
|
(
)
(
)
(

 


))
(
1
)(
(
)
( Y
X
r
X
s
Y
X
S 



)
( k
rel
ins PG
N 
Rule Strength (2)
 The strength of the generalization X
(BK is used),
k
PG
l
k
l
l
k
l
bk
k
N
PG
PI
BKF
PG
PI
p
PG
s
X
s





)
|
(
)
|
(
)
(
)
(
Rule Strength (3)
 The rate of noises
is the number of instances
belonging to the class Y within the instances
satisfying the generalization X.
)
(
)
,
(
)
(
)
( X
N
Y
X
N
X
N
rel
ins
class
ins
rel
ins
Y
X
r 

 


)
,
( Y
X
N class
ins
Rule Discovery by GDT-RS
Condition Attrs.: a, b, c
a: Va = {a0, a1}
b: Vb = {b0, b1, b2}
c: Vc = {c0, c1}
Class: d:
d: Vd = {y,n}
U a b c d
u1 a0 b0 c1 y
u2 a0 b1 c1 y
u3 a0 b0 c1 y
u4 a1 b1 c0 n
u5 a0 b0 c1 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Regarding the Instances
(Noise Rate = 0)
U a b c d
u1,
u1’ u3,
u5
a0 b0 c1
y,
y,
n
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
67
.
0
3
1
1
)
'
1
(
33
.
0
3
2
1
)
'
1
(
}
{
}
{






u
r
u
r
n
y






)
'
1
(
)
'
1
(
)
'
1
(
0
Let
}
{
}
{
u
d
T
u
r
T
u
r
T
noise
n
noise
y
noise
と

U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

Generating Discernibility Vector
for u2
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y









7
,
2
6
,
2
4
,
2
2
,
2
1
,
2
}
{
}
,
{
}
{
m
b
m
c
a
m
m
b
m
u1’ u2 u4 u6 u7
u2 b  a,c b 
Obtaining Reducts for u2
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

u1’ u2 u4 u6 u7
u2 b  a,c b 
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
2
(
c
b
b
a
c
a
b
b
c
a
b
u
fT















Generating Rules from u2
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

)
(
)
(
)
2
( c
b
b
a
u
fT 



{a0,b1} {b1,c1}
{a0b1}
a0b1c0
a0b1c1(u2)
s({a0b1}) = 0.5
{b1c1}
a0b1c1(u2)
a1b1c1(u7)
s({b1c1}) = 1
0
)
}
1
1
({ 
 y
c
b
r
y
y
0
)
}
1
0
({ 
 y
b
a
r
y
Generating Rules from u2 (2)
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

1
)
0
1
(
)
2
1
2
(
with
}
1
1
{
5
.
0
)
0
1
(
)
2
1
1
(
with
}
1
0
{












S
y
c
b
S
y
b
a
Generating Discernibility Vector
for u4
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

}
{
}
,
{
}
,
,
{
7
,
4
6
,
4
4
,
4
2
,
4
1
,
4
c
m
m
m
c
a
m
c
b
a
m








u1’ u2 u4 u6 u7
u4 a,b,c a,c   c
Obtaining Reducts for u4
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

)
(
)
(
)
(
)
(
)
4
(
c
c
c
a
c
b
a
u
fT











u1’ u2 u4 u6 u7
u4 a,b,c a,c   c
Generating Rules from u4
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

{c0}
{c0}
a0b0c0
a1b1c0(u4)
0
)
}
0
({
6
1
)
0
(



n
c
r
c
s
n
)
(
)
4
( c
u
fT 
a1b2c0
Generating Rules from u4 (2)
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

167
.
0
)
0
1
(
)
6
1
1
(
with
}
0
{ 




 S
n
c
Generating Rules from All
Instances
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

u2: {a0b1} y, S = 0.5
{b1c1} y, S =1
u4: {c0} n, S = 0.167
u6: {b2} n, S=0.25
u7: {a1c1} y, S=0.5
{b1c1} y, S=1
Rule Selection
 Selecting the rules that contain as many
instances as possible.
 Selecting the rules in the levels as high as
possible.
 Selecting the rules with larger strengths
in the same level of generalization.
Generalization belonging to
Class y
a0b1c1
(y)
a1b1c1
(y)
*b1c1 1/2 1/2
a1*c1 1/3
a0b1* 1/2
{b1c1} y with S = 1 u2,u7
{a1c1} y with S = 1/2 u7
{a0b1} y with S = 1/2 u2
u2 u7
Generalization belonging to
Class n
a0b2c1
(n)
a1b1c0
(n)
**c0 1/6
*b2* 1/4
c0 n with S = 1/6 u4
b2 n with S = 1/4 u6
u4 u6
Results from the Sample DB
(Noise Rate = 0)
 Certain Rules: Instances Covered
{c0} n with S = 1/6 u4
{b2} n with S = 1/4 u6
{b1c1} y with S = 1 u2,u7
 Possible Rules:
b0 y with S = (1/4)(1/2)
a0 & b0 y with S = (1/2)(2/3)
a0 & c1 y with S = (1/3)(2/3)
b0 & c1 y with S = (1/2)(2/3)
Instances Covered: u1, u3, u5
Results from the Sample DB (2)
(Noise Rate > 0)
Regarding Instances
(Noise Rate > 0)
U a b c d
u1,
u1’ u3,
u5
a0 b0 c1
y,
y,
n
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
67
.
0
3
1
1
)
'
1
(
33
.
0
3
2
1
)
'
1
(
}
{
}
{






u
r
u
r
n
y
y
u
d
T
u
r
T
noise
y
noise




)
'
1
(
)
'
1
(
5
.
0
Let
}
{

U a b c d
u1’ a0 b0 c1 y
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Rules Obtained from All
Instacnes
U a b c d
u1’ a0 b0 c1 y
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
u2: {a0b1} y, S=0.5
{b1c1} y, S=1
u4: {c0} n, S=0.167
u6: {b2} n, S=0.25
u7: {a1c1} y, S=0.5
{b1c1} y, S=1
u1’:{b0} y, S=1/4*2/3=0.167
Example of Using BK
a0b0c0 a0b0c1 a0b1c0 a0b1c1 a0b2c0 a0b2c1 … a1b2c1
a0b0* 1/2 1/2
a0b1* 1/2 1/2
a0*c1 1/3 1/3 1/3
a0** 1/6 1/6 1/6 1/6 1/6 1/6
BK: a0 => c1, 100%
a0b0c0 a0b0c1 a0b1c0 a0b1c1 a0b2c0 a0b2c1 … a1b2c1
a0b0* 0 1
a0b1* 0 1
a0*c1 1/3 1/3 1/3
a0** 0 1/3 0 1/3 0 1/3
Changing Strength of
Generalization by BK
U a b c d
u1’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y

)
(
)
(
)
2
( c
b
b
a
u
fT 



{a0,b1} {b1,c1}
{a0b1}
a0b1c0
a0b1c1(u2)
s({a0b1}) = 0.5
0
)
}
1
0
({ 
 y
b
a
r
{a0b1}
a0b1c0
a0b1c1(u2)
s({a0b1}) = 1
0
)
}
1
0
({ 
 y
b
a
r
1/2
1/2 100%
0%
a0 => c1, 100%
Algorithm 1
Optimal Set of Rules
 Step 1. Consider the instances with the same
condition attribute values as one instance,
called a compound instance.
 Step 2. Calculate the rate of noises r for each
compound instance.
 Step 3. Select one instance u from U and
create a discernibility vector for u.
 Step 4. Calculate all reducts for the instance u
by using the discernibility function.

More Related Content

PDF
Roughset & it’s variants
PPTX
Improving circuit miniaturization and its efficiency using Rough Set Theory( ...
PDF
A Rough Set View On Bayes Theorem
PPT
Chapter 0
PDF
Rough sets and fuzzy rough sets in Decision Making
PPT
2.2.ppt.SC
PDF
An enhanced fuzzy rough set based clustering algorithm for categorical data
PDF
An enhanced fuzzy rough set based clustering algorithm for categorical data
Roughset & it’s variants
Improving circuit miniaturization and its efficiency using Rough Set Theory( ...
A Rough Set View On Bayes Theorem
Chapter 0
Rough sets and fuzzy rough sets in Decision Making
2.2.ppt.SC
An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical data

Similar to E5-roughsets unit-V.pdf (20)

PDF
Uncertainty classification of expert systems a rough set approach
PDF
piya_new__1_ (1).pdf
PDF
Topics In Rough Set Theory Current Applications To Granular Computing Seiki A...
PDF
Novel set approximations in generalized multi valued decision information sys...
PPT
Chapter 2
PDF
Big Data with Rough Set Using Map- Reduce
PDF
Application for Logical Expression Processing
PDF
Reduct generation for the incremental data using rough set theory
PDF
An incremental approach to attribute reduction of dynamic set-valued informat...
PDF
Soft Lattice in Approximation Space
PDF
Fuzzy Rough Information Measures and their Applications
PDF
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
PDF
Fuzzy Rough Information Measures and their Applications
PDF
Published Paper of International Journal of Computational Science and Informa...
PDF
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
PDF
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
PDF
A study on rough set theory based
DOCX
JOURNAL OF COMPUTER AND SYSTEM SCIENCES 46, 39-59 (1993) V.docx
PPTX
Text categorization using Rough Set
PDF
Fundamentals of Parameterised Covering Approximation Space
Uncertainty classification of expert systems a rough set approach
piya_new__1_ (1).pdf
Topics In Rough Set Theory Current Applications To Granular Computing Seiki A...
Novel set approximations in generalized multi valued decision information sys...
Chapter 2
Big Data with Rough Set Using Map- Reduce
Application for Logical Expression Processing
Reduct generation for the incremental data using rough set theory
An incremental approach to attribute reduction of dynamic set-valued informat...
Soft Lattice in Approximation Space
Fuzzy Rough Information Measures and their Applications
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
Fuzzy Rough Information Measures and their Applications
Published Paper of International Journal of Computational Science and Informa...
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
A study on rough set theory based
JOURNAL OF COMPUTER AND SYSTEM SCIENCES 46, 39-59 (1993) V.docx
Text categorization using Rough Set
Fundamentals of Parameterised Covering Approximation Space
Ad

More from Ramya Nellutla (18)

PPTX
Module 8- Technological and Communication Skills.pptx
PDF
Module -6.pdf Machine Learning Types and examples
PDF
Module 5.pdf Machine Learning Types and examples
PPTX
Unit -3.pptx cloud Security unit -3 notes
PPT
MOBILE SECURITY -UNIT -II PPT IV- PROFESSIONAL ELEFCTIVE
PPTX
Unit-1Mobile Security Notes Mobile Communication
PPT
artificial Intelligence unit1 ppt (1).ppt
PDF
Deep network notes.pdf
PDF
pentration testing.pdf
PPTX
Deep Learning.pptx
PDF
Unit-I PPT.pdf
PDF
- Social Engineering Unit- II Part- I.pdf
PPTX
Datamodels.pptx
PPT
Unit-3-Part-1 [Autosaved].ppt
PPTX
Unit-3.pptx
PDF
Unit-II -Soft Computing.pdf
PPT
SC01_IntroductionSC-Unit-I.ppt
PPTX
- Fuzzy Systems -II.pptx
Module 8- Technological and Communication Skills.pptx
Module -6.pdf Machine Learning Types and examples
Module 5.pdf Machine Learning Types and examples
Unit -3.pptx cloud Security unit -3 notes
MOBILE SECURITY -UNIT -II PPT IV- PROFESSIONAL ELEFCTIVE
Unit-1Mobile Security Notes Mobile Communication
artificial Intelligence unit1 ppt (1).ppt
Deep network notes.pdf
pentration testing.pdf
Deep Learning.pptx
Unit-I PPT.pdf
- Social Engineering Unit- II Part- I.pdf
Datamodels.pptx
Unit-3-Part-1 [Autosaved].ppt
Unit-3.pptx
Unit-II -Soft Computing.pdf
SC01_IntroductionSC-Unit-I.ppt
- Fuzzy Systems -II.pptx
Ad

Recently uploaded (20)

PPTX
CNS - Unit 1 (Introduction To Computer Networks) - PPT (2).pptx
PDF
Research on ultrasonic sensor for TTU.pdf
PPTX
MAD Unit - 3 User Interface and Data Management (Diploma IT)
PPTX
CT Generations and Image Reconstruction methods
PPTX
Module1.pptxrjkeieuekwkwoowkemehehehrjrjrj
PDF
Lesson 3 .pdf
PPTX
Micro1New.ppt.pptx the main themes if micro
PPTX
Environmental studies, Moudle 3-Environmental Pollution.pptx
PPTX
SE unit 1.pptx aaahshdhajdviwhsiehebeiwheiebeiev
PPTX
CS6006 - CLOUD COMPUTING - Module - 1.pptx
PDF
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PPT
Programmable Logic Controller PLC and Industrial Automation
PDF
Mechanics of materials week 2 rajeshwari
PDF
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
PDF
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PPTX
Micro1New.ppt.pptx the mai themes of micfrobiology
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PPT
UNIT-I Machine Learning Essentials for 2nd years
CNS - Unit 1 (Introduction To Computer Networks) - PPT (2).pptx
Research on ultrasonic sensor for TTU.pdf
MAD Unit - 3 User Interface and Data Management (Diploma IT)
CT Generations and Image Reconstruction methods
Module1.pptxrjkeieuekwkwoowkemehehehrjrjrj
Lesson 3 .pdf
Micro1New.ppt.pptx the main themes if micro
Environmental studies, Moudle 3-Environmental Pollution.pptx
SE unit 1.pptx aaahshdhajdviwhsiehebeiwheiebeiev
CS6006 - CLOUD COMPUTING - Module - 1.pptx
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
August -2025_Top10 Read_Articles_ijait.pdf
Programmable Logic Controller PLC and Industrial Automation
Mechanics of materials week 2 rajeshwari
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
Micro1New.ppt.pptx the mai themes of micfrobiology
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
UNIT-I Machine Learning Essentials for 2nd years

E5-roughsets unit-V.pdf

  • 1. Rough Sets in KDD Tutorial Notes Andrzej Skowron Warsaw University Ning Zhong Maebashi Institute of Technolgy Copyright 2000 by A. Skowron & N. Zhong
  • 2. About the Speakers  Andrzej Skowron received his Ph.D. from Warsaw University. He is a professor in Faculty of Mathematics, Computer Science and Mechanics, Warsaw University, Poland. His research interests include soft computing methods and applications, in particular, reasoning with incomplete information, approximate reasoning, rough sets, rough mereology, granular computing, synthesis and analysis of complex objects, intelligent agents, knowledge discovery and data mining, etc, with over 200 journal and conference publications. He is an editor of several international journals and book series including Fundamenta Informaticae (editor in chief), Data Mining and Knowledge Discovery. He is president of International Rough Set Society. He was an invited speaker at many international conferences, and has served or is currently serving on the program committees of over 40 international conferences and workshops, including ISMIS’97-99 (program chair), RSCTC’98-00 (program chair), RSFDGrC’99 (program chair).
  • 3. About the Speakers (2)  Ning Zhong received his Ph.D. from the University of Tokyo. He is director of Knowledge Information Systems Laboratory, and an associate professor in Department of Information Engineering, Maebashi Institute of Technology, Japan. His research interests include knowledge discovery and data mining, rough sets and granular-soft computing, intelligent agents and databases, knowledge-based systems and hybrid systems, with over 80 journal and conference publications. He is an editor of Knowledge and Information Systems: an international journal (Springer). He is a member of the advisory board of International Rough Set Society, ACM SIGKDD International Liaisons Board, the Steering Committee of PAKDD conferences, the advisory board and coordinator of BISC/SIGGrC. He has served or is currently serving on the program committees of over 25 international conferences and workshops, including PAKDD’99 (program chair), IAT’99 (program chair), and RSFDGrC’99 (program chair).
  • 4. Contents  Introduction  Basic Concepts of Rough Sets  A Rough Set Based KDD process  Rough Sets in ILP and GrC  Concluding Remarks (Summary, Advanced Topics, References and Further Readings).
  • 5. Introduction  Rough set theory was developed by Zdzislaw Pawlak in the early 1980’s.  Representative Publications: – Z. Pawlak, “Rough Sets”, International Journal of Computer and Information Sciences, Vol.11, 341-356 (1982). – Z. Pawlak, Rough Sets - Theoretical Aspect of Reasoning about Data, Kluwer Academic Pubilishers (1991).
  • 6. Introduction (2)  The main goal of the rough set analysis is induction of approximations of concepts.  Rough sets constitutes a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data.  It can be used for feature selection, feature extraction, data reduction, decision rule generation, and pattern extraction (templates, association rules) etc.
  • 7. Introduction (3)  Recent extensions of rough set theory have developed new methods for decomposition of large data sets, data mining in distributed and multi-agent systems, and granular computing. This presentation shows how several aspects of the above problems are solved by the (classic) rough set approach, discusses some advanced topics, and gives further research directions.
  • 8. Basic Concepts of Rough Sets  Information/Decision Systems (Tables)  Indiscernibility  Set Approximation  Reducts and Core  Rough Membership  Dependency of Attributes
  • 9. Information Systems/Tables  IS is a pair (U, A)  U is a non-empty finite set of objects.  A is a non-empty finite set of attributes such that for every  is called the value set of a. a V U a  : . A a a V Age LEMS x1 16-30 50 x2 16-30 0 x3 31-45 1-25 x4 31-45 1-25 x5 46-60 26-49 x6 16-30 26-49 x7 46-60 26-49
  • 10. Decision Systems/Tables  DS:  is the decision attribute.  The elements of A are called the condition attributes. Age LEMS Walk x1 16-30 50 yes x2 16-30 0 no x3 31-45 1-25 no x4 31-45 1-25 yes x5 46-60 26-49 no x6 16-30 26-49 yes x7 46-60 26-49 no }) { , ( d A U T   A d 
  • 11. Issues in the Decision Table  The same or indiscernible objects may be represented several times.  Some of the attributes may be superfluous.
  • 12. Indiscernibility  The equivalence relation A binary relation which is reflexive (i.e. an object is in relation with itself xRx) , symmetric (if xRy then yRx) and transitive (if xRy and yRz then xRz).  The equivalence class of an element consists of all objects such that xRy. X X R   X x X y
  • 13. Indiscernibility (2)  Let IS = (U, A) be an information system, then with any there is associated an equivalence relation: where is called the B-indiscernibility relation.  If then objects x and x’are indiscernible from each other by attributes from B.  The equivalence classes of the B-indiscernibility relation are denoted A B  )} ' ( ) ( , | ) ' , {( ) ( 2 x a x a B a U x x B INDIS      ) (B INDIS ), ( ) ' , ( B IND x x IS  . ] [ B x
  • 14. An Example of Indiscernibility  The non-empty subsets of the condition attributes are {Age}, {LEMS}, and {Age, LEMS}.  IND({Age}) = {{x1,x2,x6}, {x3,x4}, {x5,x7}}  IND({LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x6,x7}}  IND({Age,LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x7}, {x6}}. Age LEMS Walk x1 16-30 50 yes x2 16-30 0 no x3 31-45 1-25 no x4 31-45 1-25 yes x5 46-60 26-49 no x6 16-30 26-49 yes x7 46-60 26-49 no
  • 15. Observations  An equivalence relation induces a partitioning of the universe.  The partitions can be used to build new subsets of the universe.  Subsets that are most often of interest have the same value of the decision attribute. It may happen, however, that a concept such as “Walk” cannot be defined in a crisp manner.
  • 16. Set Approximation  Let T = (U, A) and let and We can approximate X using only the information contained in B by constructing the B-lower and B-upper approximations of X, denoted and respectively, where A B  . U X  X B X B }, ] [ | { X x x X B B   }. 0 ] [ | {    X x x X B B
  • 17. Set Approximation (2)  B-boundary region of X, consists of those objects that we cannot decisively classify into X in B.  B-outside region of X, consists of those objects that can be with certainty classified as not belonging to X.  A set is said to be rough if the boundary region is non-empty. , ) ( X B X B X BNB   , X B U 
  • 18. An Example of Set Approximation  Let W = {x | Walk(x) = yes}.  The decision class, Walk, is rough since the boundary region is not empty. Age LEMS Walk x1 16-30 50 yes x2 16-30 0 no x3 31-45 1-25 no x4 31-45 1-25 yes x5 46-60 26-49 no x6 16-30 26-49 yes x7 46-60 26-49 no }. 7 , 5 , 2 { }, 4 , 3 { ) ( }, 6 , 4 , 3 , 1 { }, 6 , 1 { x x x W A U x x W BN x x x x W A x x W A A     
  • 19. An Example of Set Approximation (2) yes yes/no no {{x1,{x6}} {{x3,x4}} {{x2}, {x5,x7}}
  • 20. U setX U/R R : subset of attributes X R X R Lower & Upper Approximations
  • 21. Lower & Upper Approximations (2) } : / { X Y R U Y X R     } 0 : / {     X Y R U Y X R  Lower Approximation: Upper Approximation:
  • 22. Lower & Upper Approximations (3) X1 = Flu(yes) = {u2, u3, u6, u7} Lower approx., RX1 {u2, u3} Upper approx., {u2, u3, u6, u7, u8, u5} X2 = Flu(no) = {u1, u4, u5, u8} Lower approx., RX2 {u1, u4} Upper approx., {u1, u4, u5, u8, u7, u6} X1 R X2 R U Headache Temp. Flu U1 Yes Normal No U2 Yes High Yes U3 Yes Very-high Yes U4 No Normal No U5 N N No o o H H Hi i ig g gh h h N N No o o U6 No Very-high Yes U7 N N No o o H H Hi i ig g gh h h Y Y Ye e es s s U8 No Very-high No Elementary sets of indiscernibility relations defined by R = {Headache, Temp.} are {u1}, {u2}, {u3}, {u4}, {u5, u7}, {u6, u8}.
  • 23. Lower & Upper Approximations (4) R = {Headache, Temp.} U/R = { {u1}, {u2}, {u3}, {u4}, {u5, u7}, {u6, u8}} X1 = Flu(yes) = {u2,u3,u6,u7} X2 = Flu(no) = {u1,u4,u5,u8} RX1 = {u2, u3} = {u2, u3, u6, u7, u8, u5} RX2 = {u1, u4} = {u1, u4, u5, u8, u7, u6} X1 R X2 R u1 u4 u3 X1 X2 u5 u7 u2 u6 u8
  • 25. Properties of Approximation (2) ) ( )) ( ( )) ( ( ) ( )) ( ( )) ( ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( X B X B B X B B X B X B B X B B X B X B X B X B Y B X B Y X B Y B X B Y X B                 where -X denotes U - X.
  • 26. Four Basic Classes of Rough Sets  X is roughly B-definable, iff and  X is internally B-undefinable, iff and  X is externally B-undefinable, iff and  X is totally B-undefinable, iff and 0 ) (  X B , ) ( U X B  0 ) (  X B , ) ( U X B  0 ) (  X B , ) ( U X B  0 ) (  X B . ) ( U X B 
  • 27. Accuracy of Approximation where |X| denotes the cardinality of Obviously If X is crisp with respect to B. If X is rough with respect to B. | ) ( | | ) ( | ) ( X B X B X B   . 0  X . 1 0   B  , 1 ) (  X B  , 1 ) (  X B 
  • 28. Issues in the Decision Table  The same or indiscernible objects may be represented several times.  Some of the attributes may be superfluous (redundant). That is, their removal cannot worsen the classification.
  • 29. Reducts  Keep only those attributes that preserve the indiscernibility relation and, consequently, set approximation.  There are usually several such subsets of attributes and those which are minimal are called reducts.
  • 30. Dispensable & Indispensable Attributes Let Attribute c is dispensable in T, if , otherwise attribute c is indispensable in T. . C c ) ( ) ( }) { ( D POS D POS c C C   X C D POS D U X C / ) (   The positive region:
  • 31. Independent  T = (U, A, C, D) is independent if all are indispensable in T. C c
  • 32. Reduct & Core  The set of attributes is called a reduct of C, if T’= (U, A, R, D) is independent and  The set of all the condition attributes indispensable in T is denoted by CORE(C). where RED(C) is the set of all reducts of C. C R  ). ( ) ( D POS D POS C R  ) ( ) ( C RED C CORE  
  • 33. An Example of Reducts & Core U Headache Muscle pain Temp. Flu U1 Yes Yes Normal No U2 Yes Yes High Yes U3 Yes Yes Very-high Yes U4 No Yes Normal No U5 No No High No U6 No Yes Very-high Yes U Muscle pain Temp. Flu U1,U4 Yes Normal No U2 Yes High Yes U3,U6 Yes Very-high Yes U5 No High No U Headache Temp. Flu U1 Yes Norlmal No U2 Yes High Yes U3 Yes Very-high Yes U4 No Normal No U5 No High No Reduct1 = {Muscle-pain,Temp.} Reduct2 = {Headache, Temp.} CORE = {Headache,Temp} {MusclePain, Temp} = {Temp} 
  • 34. Discernibility Matrix  Let T = (U, A, C, D) be a decision table, with By a discernibility matrix of T, denoted M(T), we will mean matrix defined as: for i, j = 1,2,…,n. Here denotes that this case does not need to be considered. They classify objects and into different classes. }. ,..., , { 2 1 n u u u U  n n  )] ( ) ( [ )} ( ) ( : { )] ( ) ( [ j i j i j i u d u d D d if u c u c C c u d u d D d if ij m           i u j u , } { A a a V  
  • 35. Discernibility Function  For any , U ui  }} ,..., 2 , 1 { , : { ) ( n j i j m u f ij j i T      where (1) is the disjunction of all variables a such that if (2) if (3) if ij m  , ij m c .   ij m ), ( false mij    .   ij m ), (true t mij   .   ij m Each logical product in the minimal disjunctive normal form defines a reduct of instance . i u
  • 36. Examples of Discernibility Matrix No a b c d u1 a0 b1 c1 y u2 a1 b1 c0 n u3 a0 b2 c1 n u4 a1 b1 c1 y C = {a, b, c} D = {d} In order to discern equivalence classes of the decision attribute d, to preserve conditions described by the discernibility matrix for this table u1 u2 u3 u2 u3 u4 a,c b c a,b Reduct = {b, c} ) ( ) ( b a c b c a       
  • 37. Examples of Discernibility Matrix (2) a b c d E u1 1 0 2 1 1 u2 1 0 2 0 1 u3 1 2 0 0 2 u4 1 2 2 1 0 u5 2 1 0 0 2 u6 2 1 1 0 2 u7 2 1 2 1 1 u1 u2 u3 u4 u5 u6 u2 u3 u4 u5 u6 u7 b,c,d b,c b b,d c,d a,b,c,d a,b,c a,b,c,d a,b,c,d a,b,c a,b,c,d a,b,c,d a,b c,d c,d Core = {b} Reduct1 = {b,c} Reduct2 = {b,d}      
  • 38. Rough Membership  The rough membership function quantifies the degree of relative overlap between the set X and the equivalence class to which x belongs.  The rough membership function can be interpreted as a frequency-based estimate of where u is the equivalence relation of IND(B). B x] [ ] 1 , 0 [ :  U B X  | ] [ | | ] [ | B B B X x X x    ), | ( u X x P 
  • 39. Rough Membership (2)  The formulae for the lower and upper approximations can be generalized to some arbitrary level of precision by means of the rough membership function  Note: the lower and upper approximations as originally formulated are obtained as a special case with ] 1 , 5 . 0 (   }. 1 ) ( | { } ) ( | {            x x X B x x X B B X B X . 1  
  • 40. Dependency of Attributes  Discovering dependencies between attributes is an important issue in KDD.  A set of attribute D depends totally on a set of attributes C, denoted if all values of attributes from D are uniquely determined by values of attributes from C. , D C 
  • 41. Dependency of Attributes (2)  Let D and C be subsets of A. We will say that D depends on C in a degree k denoted if where called a positive region of the partition U/D with respect to C. ), 1 0 (   k , D C k  | | | ) ( | ) , ( U D POS D C k C    ), ( ) ( / X C D POS D U X C   
  • 42. Dependency of Attributes (3)  Obviously  If k = 1 we say that D depends totally on C.  If k < 1 we say that D depends partially (in a degree k) on C. . | | | ) ( | ) , ( /    D U X U X C D C 
  • 43. A Rough Set Based KDD Process  Discretization based on RS and Boolean Reasoning (RSBR).  Attribute selection based RS with Heuristics (RSH).  Rule discovery by GDT-RS.
  • 44. What Are Real World Issues ?  Very large data sets  Uncertainty (noisy data)  Incompleteness (missing, incomplete data)  Data change  Use of background knowledge
  • 45. very large data set noisy data incomplete instances data change use of background knowledge Real world issues Methods ID3 Prism Version BP Dblearn (C4.5) Space Okay possible
  • 47. Stoch. Proc. Belief Nets Conn. Nets GDT Deduction Induction Abduction RoughSets Fuzzy Sets Soft Techniques for KDD (2)
  • 49. GDT : Generalization Distribution Table RS : Rough Sets TM: Transition Matrix ILP : Inductive Logic Programming GrC : Granular Computing
  • 50. A Rough Set Based KDD Process  Discretization based on RS and Boolean Reasoning (RSBR).  Attribute selection based RS with Heuristics (RSH).  Rule discovery by GDT-RS.
  • 51. Discretization based on RSBR  In the discretization of a decision table = where is an interval of real-valued values, we search for a partition of for any  Any partition of is defined by a sequence of the so-called cuts from  Any family of partitions can be identified with a set of cuts. }), { , ( d A U  ) , [ a a a w v V  a P a V . A a a V 1 v k v v v    ... 2 1 . a V A a a P  } {
  • 52. Discretization Based on RSBR (2) In the discretization process, we search for a set of cuts satisfying some natural conditions. A a b d u1 0.8 2 1 u2 1 0.5 0 u3 1.3 3 0 u4 1.4 1 1 u5 1.4 2 0 u6 1.6 3 1 u7 1.3 1 1 A a b d u1 0 2 1 u2 1 0 0 u3 1 2 0 u4 1 1 1 u5 1 2 0 u6 2 2 1 u7 1 1 1 P P P = {(a, 0.9), (a, 1.5), (b, 0.75), (b, 1.5)}
  • 53. A Geometrical Representation of Data and Cuts 0 0.8 1 1.3 1.4 1.6 a b 3 2 1 0.5 x1 x2 x3 x4 x7 x5 x6
  • 54. A Geometrical Representation of Data and Cuts (2) 0 0.8 1 1.3 1.4 1.6 a b 3 2 1 0.5 x1 x2 x3 x4 x5 x6 x7
  • 55. Discretization Based on RSBR (3)  The sets of possible values of a and b are defined by  The sets of values of a and b on objects from U are given by a(U) = {0.8, 1, 1.3, 1.4, 1.6}; b(U) = {0.5, 1, 2, 3}. ); 2 , 0 [  a V ). 4 , 0 [  b V
  • 56. Discretization Based on RSBR (4)  The discretization process returns a partition of the value sets of conditional attributes into intervals.
  • 57. A Discretization Process  Step 1: define a set of Boolean variables, where corresponds to the interval [0.8, 1) of a corresponds to the interval [1, 1.3) of a corresponds to the interval [1.3, 1.4) of a corresponds to the interval [1.4, 1.6) of a corresponds to the interval [0.5, 1) of b corresponds to the interval [1, 2) of b corresponds to the interval [2, 3) of b } , , , , , , { ) ( 3 2 1 4 3 2 1 b b b a a a a p p p p p p p U BV  b b b a a a a p p p p p p p 3 2 1 4 3 2 1
  • 58. The Set of Cuts on Attribute a 0.8 1.0 1.3 1.4 1.6 a a p1 a p2 a p3 a p4 1 c 2 c 3 c 4 c
  • 59. A Discretization Process (2)  Step 2: create a new decision table by using the set of Boolean variables defined in Step 1. Let be a decision table, be a propositional variable corresponding to the interval for any and }) { , ( d A U T   a k p ) , [ 1 a k a k v v  } 1 ,..., 1 {   a n k . A a
  • 60. A Sample T Defined in Step 2 U* a p1 a p3 a p2 a p4 b p1 b p2 b p3 (x1,x2) (x1,x3) (x1,x5) (x4,x2) (x4,x3) (x4,x5) (x6,x2) (x6,x3) (x6,x5) (x7,x2) (x7,x3) (x7,x5) 1 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0
  • 61. The Discernibility Formula  The discernibility formula means that in order to discern object x1 and x2, at least one of the following cuts must be set, a cut between a(0.8) and a(1) a cut between b(0.5) and b(1) a cut between b(1) and b(2). b b a p p p x x 2 1 1 2 1 ) , (    
  • 62. The Discernibility Formulae for All Different Pairs b b a p p p x x 2 1 1 2 1 ) , (     b a a p p p x x 3 1 1 3 1 ) , (     a a a p p p x x 3 2 1 5 1 ) , (     b a a p p p x x 1 3 2 2 4 ) , (     b b a p p p x x 3 2 2 3 4 ) , (     b p x x 2 5 4 ) , (  
  • 63. The Discernibility Formulae for All Different Pairs (2) b b b a a a p p p p p p x x 3 2 1 4 3 2 2 6 ) , (        a a p p x x 4 3 3 6 ) , (    b a p p x x 3 4 5 6 ) , (    b a p p x x 1 2 2 7 ) , (    b b p p x x 3 2 3 7 ) , (    b a p p x x 2 3 5 7 ) , (   
  • 64. A Discretization Process (3)  Step 3: find the minimal subset of p that discerns all objects in different decision classes. The discernibility boolean propositional formula is defined as follows, )}. ( ) ( : ) . ( { j i U x d x d j i     
  • 65. The Discernibility Formula in CNF Form ) ( ) ( 3 2 1 2 1 1 b a a b b a U p p p p p p        ) ( ) ( 3 2 2 1 3 2 b b a b a a p p p p p p       ) ( 3 2 1 4 3 2 b b b a a a p p p p p p       ) ( ) ( ) ( 1 2 3 4 4 3 b a b a a a p p p p p p       . ) ( ) ( 2 2 3 3 2 b b a b b p p p p p     
  • 66. The Discernibility Formula in DNF Form  We obtain four prime implicants, is the optimal result, because it is the minimal subset of P. } , , { 2 4 2 b a a p p p ) ( ) ( 3 2 3 2 2 4 2 b b a a b a a U p p p p p p p         ). ( ) ( 2 1 4 1 3 2 1 3 b b a a b b b a p p p p p p p p        
  • 67. The Minimal Set Cuts for the Sample DB 0 0.8 1 1.3 1.4 1.6 a b 3 2 1 0.5 x1 x2 x3 x4 x5 x6 x7
  • 68. A Result A a b d u1 0.8 2 1 u2 1 0.5 0 u3 1.3 3 0 u4 1.4 1 1 u5 1.4 2 0 u6 1.6 3 1 u7 1.3 1 1 A a b d u1 0 1 1 u2 0 0 0 u3 1 1 0 u4 1 0 1 u5 1 1 0 u6 2 1 1 u7 1 0 1 P P P = {(a, 1.2), (a, 1.5), (b, 1.5)}
  • 69. A Rough Set Based KDD Process  Discretization based on RS and Boolean Reasoning (RSBR).  Attribute selection based RS with Heuristics (RSH).  Rule discovery by GDT-RS.
  • 70. Attribute Selection U Headache Muscle-pain Temp. Flu U1 Yes Yes Normal No U2 Yes Yes High Yes U3 Yes Yes Very-high Yes U4 No Yes Normal No U5 No No High No U6 No Yes Very-high Yes U Muscle-pain Temp. Flu U1 Yes Normal No U2 Yes High Yes U3 Yes Very-high Yes U4 Yes Normal No U5 No High No U6 Yes Very-high Yes U Headache Temp. Flu U1 Yes Normal No U2 Yes High Yes U3 Yes Very-high Yes U4 No Normal No U5 No High No U6 No Very-high Yes
  • 71. Observations  A database always contains a lot of attributes that are redundant and not necessary for rule discovery.  If these redundant attributes are not removed, not only the time complexity of rule discovery increases, but also the quality of the discovered rules may be significantly depleted.
  • 72. The Goal of Attribute Selection Finding an optimal subset of attributes in a database according to some criterion, so that a classifier with the highest possible accuracy can be induced by learning algorithm using information about data available only from the subset of attributes.
  • 73. The Filter Approach  Preprocessing  The main strategies of attribute selection: – The minimal subset of attributes – Selection of the attributes with a higher rank  Advantage – Fast  Disadvantage – Ignoring the performance effects of the induction algorithm
  • 74. The Wrapper Approach  Using the induction algorithm as a part of the search evaluation function  Possible attribute subsets (N-number of attributes)  The main search methods: – Exhaustive/Complete search – Heuristic search – Non-deterministic search  Advantage – Taking into account the performance of the induction algorithm  Disadvantage – The time complexity is high 1 2  N
  • 75. Basic Ideas: Attribute Selection using RSH  Take the attributes in CORE as the initial subset.  Select one attribute each time using the rule evaluation criterion in our rule discovery system, GDT-RS.  Stop when the subset of selected attributes is a reduct.
  • 76. Why Heuristics ?  The number of possible reducts can be where N is the number of attributes. Selecting the optimal reduct from all of possible reducts is NP-hard and heuristics must be used. 1 2  N
  • 77. The Rule Selection Criteria in GDT-RS  Selecting the rules that cover as many instances as possible.  Selecting the rules that contain as little attributes as possible, if they cover the same number of instances.  Selecting the rules with larger strengths, if they have same number of condition attributes and cover the same number of instances.
  • 78. Attribute Evaluation Criteria  Selecting the attributes that cause the number of consistent instances to increase faster – To obtain the subset of attributes as small as possible  Selecting an attribute that has smaller number of different values – To guarantee that the number of instances covered by rules is as large as possible.
  • 79. A Heuristic Algorithm for Attribute Selection  Let R be a set of the selected attributes, P be the set of unselected condition attributes, U be the set of all instances, X be the set of contradictory instances, and EXPECT be the threshold of accuracy.  In the initial state, R = CORE(C), k = 0. ) (D POS U X R   ), (C CORE C P  
  • 80. A Heuristic Algorithm for Attribute Selection (2)  Step 1. If k >= EXPECT, finish, otherwise calculate the dependency degree, k,  Step 2. For each p in P, calculate )) } { /( ) ( ( max_ | ) ( | }) { ( }) { ( D p R D POS size m D POS v p R p p R p       . | | | ) ( | U D POS k R  where max_size denotes the cardinality of the maximal subset.
  • 81. A Heuristic Algorithm for Attribute Selection (3)  Step 3. Choose the best attribute p with the largest and let  Step 4. Remove all consistent instances u in from X.  Step 5. Go back to Step 1. ) (D POSR }. { } { p P P p R R     , p p m v 
  • 82. Main Features of RSH  It can select a better subset of attributes quickly and effectively from a large DB.  The selected attributes do not damage the performance of induction so much.
  • 83. An Example of Attribute Selection U a b c d e u1 1 0 2 1 1 u2 1 0 2 0 1 u3 1 2 0 0 2 u4 1 2 2 1 0 u5 2 1 0 0 2 u6 2 1 1 0 2 u7 2 1 2 1 1 Condition Attributes: a: Va = {1, 2} b: Vb = {0, 1, 2} c: Vc = {0, 1, 2} d: Vd = {0, 1} Decision Attribute: e: Ve = {0, 1, 2}
  • 84. U b c d e u1 0 2 1 1 u2 0 2 0 1 u3 2 0 0 2 u4 2 2 1 0 u5 1 0 0 2 u6 1 1 0 2 u7 1 2 1 1 Searching for CORE Removing attribute a Removing attribute a does not cause inconsistency. Hence, a is not used as CORE.
  • 85. Searching for CORE (2) Removing attribute b U a c d e u1 1 2 1 1 u2 1 2 0 1 u3 1 0 0 2 u4 1 2 1 0 u5 2 0 0 2 u6 2 1 0 2 u7 2 2 1 1 0 1 2 1 4 1 1 2 1 1 : : e d c a u e d c a u   Removing attribute b cause inconsistency. Hence, b is used as CORE.
  • 86. Searching for CORE (3) Removing attribute c U a b d e u1 1 0 1 1 u2 1 0 0 1 u3 1 2 0 2 u4 1 2 1 0 u5 2 1 0 2 u6 2 1 0 2 u7 2 1 1 1 Removing attribute c does not cause inconsistency. Hence, c is not used as CORE.
  • 87. Searching for CORE (4) Removing attribute d U a b c e u1 1 0 2 1 u2 1 0 2 1 u3 1 2 0 2 u4 1 2 2 0 u5 2 1 0 2 u6 2 1 1 2 u7 2 1 2 1 Removing attribute d does not cause inconsistency. Hence, d is not used as CORE.
  • 88. Searching for CORE (5) CORE(C)={b} Initial subset R = {b} Attribute b is the unique indispensable attribute.
  • 89. R={b} U a b c d e u1 1 0 2 1 1 u2 1 0 2 0 1 u3 1 2 0 0 2 u4 1 2 2 1 0 u5 2 1 0 0 2 u6 2 1 1 0 2 u7 2 1 2 1 1 1 0 e b   U’ b e u1 0 1 u2 0 1 u3 2 2 u4 2 0 u5 1 2 u6 1 2 u7 1 1 The instances containing b0 will not be considered.
  • 90. Attribute Evaluation Criteria  Selecting the attributes that cause the number of consistent instances to increase faster – To obtain the subset of attributes as small as possible  Selecting the attribute that has smaller number of different values – To guarantee that the number of instances covered by a rule is as lager as possible.
  • 91. Selecting Attribute from {a,c,d} U’ a b e u3 1 2 2 u4 1 2 0 u5 2 1 2 u6 2 1 2 u7 2 1 1 1. Selecting {a} R = {a,b} 0 2 1 2 2 1 e b a e b a   1 1 2 2 1 2 e b a e b a       } /{ } , { ) ( e U X b a X POS u3,u5,u6 u4 u7 U/{e} u3 u4 u7 U/{a,b} u5 u6
  • 92. Selecting Attribute from {a,c,d} (2) 2. Selecting {c} R = {b,c} 1 2 1 2 1 1 2 0 1 0 2 2 2 0 2 e c b e c b e c b e c b e c b      U’ b c e u3 2 0 2 u4 2 2 0 u5 1 0 2 u6 1 1 2 u7 1 2 1 u3,u5,u6 u4 u7 U/{e} U’ b c e u3 2 0 2 u4 2 2 0 u5 1 0 2 u6 1 1 2 u7 1 2 1 }; 7 , 6 , 5 , 4 , 3 { ) ( } /{ } , { u u u u u X POS e U X c b   
  • 93. Selecting Attribute from {a,c,d} (3) 3. Selecting {d} R = {b,d} 1 1 1 2 0 1 0 1 2 2 0 2 e d b e d b e d b e d b     U’ b d e u3 2 0 2 u4 2 1 0 u5 1 0 2 u6 1 0 2 u7 1 1 1 u3,u5,u6 u4 u7 U/{e} }; 7 , 6 , 5 , 4 , 3 { ) ( } /{ } , { u u u u u X POS e U X d b   
  • 94. Selecting Attribute from {a,c,d} (4) 3. Selecting {d} R = {b,d} }} 6 , 5 { }, 3 {{ } , /{ }) 6 , 5 , 3 ({ } , { u u u d b u u u POS d b  Result: Subset of attributes= {b, d} u3,u5,u6 u4 u7 U/{e} u3, u4 u7 U/{b,d} u5,u6 2 }) , /{ }) 6 , 5 , 3 ({ ( max_ } , {  d b u u u POS size d b
  • 95. Experimental Results Data sets Attribute Number Instance Number Attri. N. In Core Selected Attri. N. Monk1 6 124 3 3 Monk3 6 122 4 4 Mushroom 22 8124 0 4 Breast cancer 10 699 1 4 Earthquake 16 155 0 3 Meningitis 30 140 1 4 Bacterial examination 57 20920 2 9 Slope- collapse 23 3436 6 8 Gastric cancer 38 7520 2 19
  • 96. A Rough Set Based KDD Process  Discretization based on RS and Boolean Reasoning (RSBR).  Attribute selection based RS with Heuristics (RSH).  Rule discovery by GDT-RS.
  • 97. Main Features of GDT-RS  Unseen instances are considered in the discovery process, and the uncertainty of a rule, including its ability to predict possible instances, can be explicitly represented in the strength of the rule.  Biases can be flexibly selected for search control, and background knowledge can be used as a bias to control the creation of a GDT and the discovery process.
  • 98. A Sample DB u1 a0 b0 c1 y u2 a0 b1 c1 y u3 a0 b0 c1 y u4 a1 b1 c0 n u5 a0 b0 c1 n u6 a0 b2 c1 y u7 a1 b1 c1 y Condition attributes: a, b, c a = {a0, a1} b = {b0, b1, b2} c = {c0, c1} Decision attribute: d, d = {y,n} U a b c d
  • 99. A Sample Database (2)  T = (U, A, C, D)  Attributes A = {C, D} = {a, b, c, d}  Condition Attributes C = {a, b, c} a: Va = {a0, a1} b: Vb = {b0, b1, b2} c: Vc = {c0, c1}  Decision Attribute D = {d} d: Vd = {y,n} , } { A a a V 
  • 100. A Sample GDT a0b0c0 a0b0c1 … … a1b0c0 …... a1b2c1 *b0c0 *b0c1 *b1c0 *b1c1 *b2c0 *b2c1 a0*c0 …... a1b1* a1b2* **c0 …... a0** a1** 1/2 …… 1/2 …… 1/2 …… …… …… …… …… 1/2 1/3 …… …… …… …… …… 1/2 1/6 1/6 …… …… …… 1/6 1/6 …… 1/6 …… 1/6 G(x) F(x)
  • 101. Explanation for GDT  F(x): the possible instances (PI)  G(x): the possible generalizations (PG)  the probability relationships between PI & PG. : ) ( ) ( x F x G 
  • 102. Probabilistic Relationship Between PIs and PGs     *} ] [ | { l PG l k k PG n N i a0*c0 a0b0c0 a0b1c0 a0b2c0 3 } 0 0   b c a n N {        otherwise 0 if 1 ) | ( i j PG i j PG PI N PG PI p i P=1/3 1/3 1/3 i PG N is the number of PI satisfying the ith PG.
  • 103. Unseen Instances U Headache Muscle-pain Temp. Flu U1 Yes Yes Normal No U2 Yes Yes High Yes U3 Yes Yes Very-high Yes U4 No Yes Normal No U5 No No High No U6 No Yes Very-high Yes Unseen Instances: yes,no,normal yes, no, high yes, no, very-high no, yes, high no, no, normal no, no, very-high Open world Closed world
  • 104. Rule Representation X Y with S  X denotes the conjunction of the conditions that a concept must satisfy  Y denotes a concept that the rule describes  S is a “measure of strength” of which the rule holds
  • 105. Rule Strength (1)  The strength of the generalization X (BK is no used), is the number of the observed instances satisfying the ith generalization. k PG k rel ins N PG N l k l k PG PI p PG s X s ) ( ) | ( ) ( ) (      )) ( 1 )( ( ) ( Y X r X s Y X S     ) ( k rel ins PG N 
  • 106. Rule Strength (2)  The strength of the generalization X (BK is used), k PG l k l l k l bk k N PG PI BKF PG PI p PG s X s      ) | ( ) | ( ) ( ) (
  • 107. Rule Strength (3)  The rate of noises is the number of instances belonging to the class Y within the instances satisfying the generalization X. ) ( ) , ( ) ( ) ( X N Y X N X N rel ins class ins rel ins Y X r       ) , ( Y X N class ins
  • 108. Rule Discovery by GDT-RS Condition Attrs.: a, b, c a: Va = {a0, a1} b: Vb = {b0, b1, b2} c: Vc = {c0, c1} Class: d: d: Vd = {y,n} U a b c d u1 a0 b0 c1 y u2 a0 b1 c1 y u3 a0 b0 c1 y u4 a1 b1 c0 n u5 a0 b0 c1 n u6 a0 b2 c1 n u7 a1 b1 c1 y
  • 109. Regarding the Instances (Noise Rate = 0) U a b c d u1, u1’ u3, u5 a0 b0 c1 y, y, n u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y 67 . 0 3 1 1 ) ' 1 ( 33 . 0 3 2 1 ) ' 1 ( } { } {       u r u r n y       ) ' 1 ( ) ' 1 ( ) ' 1 ( 0 Let } { } { u d T u r T u r T noise n noise y noise と  U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y 
  • 110. Generating Discernibility Vector for u2 U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y          7 , 2 6 , 2 4 , 2 2 , 2 1 , 2 } { } , { } { m b m c a m m b m u1’ u2 u4 u6 u7 u2 b  a,c b 
  • 111. Obtaining Reducts for u2 U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  u1’ u2 u4 u6 u7 u2 b  a,c b  ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 ( c b b a c a b b c a b u fT               
  • 112. Generating Rules from u2 U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  ) ( ) ( ) 2 ( c b b a u fT     {a0,b1} {b1,c1} {a0b1} a0b1c0 a0b1c1(u2) s({a0b1}) = 0.5 {b1c1} a0b1c1(u2) a1b1c1(u7) s({b1c1}) = 1 0 ) } 1 1 ({   y c b r y y 0 ) } 1 0 ({   y b a r y
  • 113. Generating Rules from u2 (2) U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  1 ) 0 1 ( ) 2 1 2 ( with } 1 1 { 5 . 0 ) 0 1 ( ) 2 1 1 ( with } 1 0 {             S y c b S y b a
  • 114. Generating Discernibility Vector for u4 U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  } { } , { } , , { 7 , 4 6 , 4 4 , 4 2 , 4 1 , 4 c m m m c a m c b a m         u1’ u2 u4 u6 u7 u4 a,b,c a,c   c
  • 115. Obtaining Reducts for u4 U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  ) ( ) ( ) ( ) ( ) 4 ( c c c a c b a u fT            u1’ u2 u4 u6 u7 u4 a,b,c a,c   c
  • 116. Generating Rules from u4 U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  {c0} {c0} a0b0c0 a1b1c0(u4) 0 ) } 0 ({ 6 1 ) 0 (    n c r c s n ) ( ) 4 ( c u fT  a1b2c0
  • 117. Generating Rules from u4 (2) U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  167 . 0 ) 0 1 ( ) 6 1 1 ( with } 0 {       S n c
  • 118. Generating Rules from All Instances U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  u2: {a0b1} y, S = 0.5 {b1c1} y, S =1 u4: {c0} n, S = 0.167 u6: {b2} n, S=0.25 u7: {a1c1} y, S=0.5 {b1c1} y, S=1
  • 119. Rule Selection  Selecting the rules that contain as many instances as possible.  Selecting the rules in the levels as high as possible.  Selecting the rules with larger strengths in the same level of generalization.
  • 120. Generalization belonging to Class y a0b1c1 (y) a1b1c1 (y) *b1c1 1/2 1/2 a1*c1 1/3 a0b1* 1/2 {b1c1} y with S = 1 u2,u7 {a1c1} y with S = 1/2 u7 {a0b1} y with S = 1/2 u2 u2 u7
  • 121. Generalization belonging to Class n a0b2c1 (n) a1b1c0 (n) **c0 1/6 *b2* 1/4 c0 n with S = 1/6 u4 b2 n with S = 1/4 u6 u4 u6
  • 122. Results from the Sample DB (Noise Rate = 0)  Certain Rules: Instances Covered {c0} n with S = 1/6 u4 {b2} n with S = 1/4 u6 {b1c1} y with S = 1 u2,u7
  • 123.  Possible Rules: b0 y with S = (1/4)(1/2) a0 & b0 y with S = (1/2)(2/3) a0 & c1 y with S = (1/3)(2/3) b0 & c1 y with S = (1/2)(2/3) Instances Covered: u1, u3, u5 Results from the Sample DB (2) (Noise Rate > 0)
  • 124. Regarding Instances (Noise Rate > 0) U a b c d u1, u1’ u3, u5 a0 b0 c1 y, y, n u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y 67 . 0 3 1 1 ) ' 1 ( 33 . 0 3 2 1 ) ' 1 ( } { } {       u r u r n y y u d T u r T noise y noise     ) ' 1 ( ) ' 1 ( 5 . 0 Let } {  U a b c d u1’ a0 b0 c1 y u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y
  • 125. Rules Obtained from All Instacnes U a b c d u1’ a0 b0 c1 y u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y u2: {a0b1} y, S=0.5 {b1c1} y, S=1 u4: {c0} n, S=0.167 u6: {b2} n, S=0.25 u7: {a1c1} y, S=0.5 {b1c1} y, S=1 u1’:{b0} y, S=1/4*2/3=0.167
  • 126. Example of Using BK a0b0c0 a0b0c1 a0b1c0 a0b1c1 a0b2c0 a0b2c1 … a1b2c1 a0b0* 1/2 1/2 a0b1* 1/2 1/2 a0*c1 1/3 1/3 1/3 a0** 1/6 1/6 1/6 1/6 1/6 1/6 BK: a0 => c1, 100% a0b0c0 a0b0c1 a0b1c0 a0b1c1 a0b2c0 a0b2c1 … a1b2c1 a0b0* 0 1 a0b1* 0 1 a0*c1 1/3 1/3 1/3 a0** 0 1/3 0 1/3 0 1/3
  • 127. Changing Strength of Generalization by BK U a b c d u1’ a0 b0 c1 u2 a0 b1 c1 y u4 a1 b1 c0 n u6 a0 b2 c1 n u7 a1 b1 c1 y  ) ( ) ( ) 2 ( c b b a u fT     {a0,b1} {b1,c1} {a0b1} a0b1c0 a0b1c1(u2) s({a0b1}) = 0.5 0 ) } 1 0 ({   y b a r {a0b1} a0b1c0 a0b1c1(u2) s({a0b1}) = 1 0 ) } 1 0 ({   y b a r 1/2 1/2 100% 0% a0 => c1, 100%
  • 128. Algorithm 1 Optimal Set of Rules  Step 1. Consider the instances with the same condition attribute values as one instance, called a compound instance.  Step 2. Calculate the rate of noises r for each compound instance.  Step 3. Select one instance u from U and create a discernibility vector for u.  Step 4. Calculate all reducts for the instance u by using the discernibility function.