SlideShare a Scribd company logo
Frequent Item Mining
What is data mining?
• Pattern Mining
• What patterns
• Why are they useful
3
Definition: Frequent Itemset
• Itemset
– A collection of one or more items
• Example: {Milk, Bread, Diaper}
– k-itemset
• An itemset that contains k items
• Support count ()
– Frequency of occurrence of an itemset
– E.g. ({Milk, Bread,Diaper}) = 2
• Support
– Fraction of transactions that contain an itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
– An itemset whose support is greater than or
equal to a minsup threshold
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Frequent Itemsets Mining
TID Transactions
100 { A, B, E }
200 { B, D }
300 { A, B, E }
400 { A, C }
500 { B, C }
600 { A, C }
700 { A, B }
800 { A, B, C, E }
900 { A, B, C }
1000 { A, C, E }
• Minimum support level
50%
– {A},{B},{C},{A,B}, {A,C}
• How to link this to Data
Cube?
Three Different Views of FIM
• Transactional Database
– How we do store a transactional
database?
• Horizontal, Vertical, Transaction-Item
Pair
• Binary Matrix
• Bipartite Graph
• How does the FIM formulated in
these different settings?
5
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
6
Frequent Itemset Generation
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given d items, there
are 2d possible
candidate itemsets
7
Frequent Itemset Generation
• Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the
database
– Match each transaction against every candidate
– Complexity ~ O(NMw) => Expensive since M = 2d !!!
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions List of
Candidates
M
w
8
Reducing Number of Candidates
• Apriori principle:
– If an itemset is frequent, then all of its subsets must also
be frequent
• Apriori principle holds due to the following property
of the support measure:
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support
)
(
)
(
)
(
:
, Y
s
X
s
Y
X
Y
X 



9
Illustrating Apriori Principle
Found to be
Infrequent
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Pruned
supersets
10
Illustrating Apriori Principle
Item Count
Bread 4
Coke 2
Milk 4
Beer 3
Diaper 4
Eggs 1
Itemset Count
{Bread,Milk} 3
{Bread,Beer} 2
{Bread,Diaper} 3
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
Itemset Count
{Bread,Milk,Diaper} 3
Items (1-itemsets)
Pairs (2-itemsets)
(No need to generate
candidates involving Coke
or Eggs)
Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,
6C1 + 6C2 + 6C3 = 41
With support-based pruning,
6 + 6 + 1 = 13
Apriori
R. Agrawal and R. Srikant. Fast algorithms for
mining association rules. VLDB, 487-499, 1994
Feequent Item Mining - Data Mining - Pattern Mining
13
How to Generate Candidates?
• Suppose the items in Lk-1 are listed in an order
• Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1
• Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
14
Challenges of Frequent Itemset Mining
• Challenges
– Multiple scans of transaction database
– Huge number of candidates
– Tedious workload of support counting for candidates
• Improving Apriori: general ideas
– Reduce passes of transaction database scans
– Shrink number of candidates
– Facilitate support counting of candidates
15
Alternative Methods for Frequent Itemset
Generation
• Representation of Database
– horizontal vs vertical data layout
TID Items
1 A,B,E
2 B,C,D
3 C,E
4 A,C,D
5 A,B,C,D
6 A,E
7 A,B
8 A,B,C
9 A,C,D
10 B
Horizontal
Data Layout
A B C D E
1 1 2 2 1
4 2 3 4 3
5 5 4 5 6
6 7 8 9
7 8 9
8 10
9
Vertical Data Layout
16
ECLAT
• For each item, store a list of transaction ids
(tids)
TID Items
1 A,B,E
2 B,C,D
3 C,E
4 A,C,D
5 A,B,C,D
6 A,E
7 A,B
8 A,B,C
9 A,C,D
10 B
Horizontal
Data Layout
A B C D E
1 1 2 2 1
4 2 3 4 3
5 5 4 5 6
6 7 8 9
7 8 9
8 10
9
Vertical Data Layout
TID-list
17
ECLAT
• Determine support of any k-itemset by intersecting tid-lists of
two of its (k-1) subsets.
• 3 traversal approaches:
– top-down, bottom-up and hybrid
• Advantage: very fast support counting
• Disadvantage: intermediate tid-lists may become too large for
memory
A
1
4
5
6
7
8
9
B
1
2
5
7
8
10
 
AB
1
5
7
8
Feequent Item Mining - Data Mining - Pattern Mining
Feequent Item Mining - Data Mining - Pattern Mining
20
FP-growth Algorithm
• Use a compressed representation of the
database using an FP-tree
• Once an FP-tree has been constructed, it uses
a recursive divide-and-conquer approach to
mine the frequent itemsets
21
FP-tree construction
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
null
A:1
B:1
null
A:1
B:1
B:1
C:1
D:1
After reading TID=1:
After reading TID=2:
22
FP-Tree Construction
null
A:7
B:5
B:3
C:3
D:1
C:1
D:1
C:3
D:1
D:1
E:1
E:1
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
Pointers are used to assist
frequent itemset generation
D:1
E:1
Transaction
Database
Item Pointer
A
B
C
D
E
Header table
23
FP-growth
null
A:7
B:5
B:1
C:1
D:1
C:1
D:1
C:3
D:1
D:1
Conditional Pattern base
for D:
P = {(A:1,B:1,C:1),
(A:1,B:1),
(A:1,C:1),
(A:1),
(B:1,C:1)}
Recursively apply FP-
growth on P
Frequent Itemsets found
(with sup > 1):
AD, BD, CD, ACD, BCD
D:1
Feequent Item Mining - Data Mining - Pattern Mining
25
Compact Representation of Frequent Itemsets
• Some itemsets are redundant because they have identical support
as their supersets
• Number of frequent itemsets
• Need a compact representation
TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1










10
1
10
3 k
k
26
Maximal Frequent Itemset
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCD
E
Border
Infrequent
Itemsets
Maximal
Itemsets
An itemset is maximal frequent if none of its immediate supersets
is frequent
27
Closed Itemset
• An itemset is closed if none of its immediate supersets has the
same support as the itemset
TID Items
1 {A,B}
2 {B,C,D}
3 {A,B,C,D}
4 {A,B,D}
5 {A,B,C,D}
Itemset Support
{A} 4
{B} 5
{C} 3
{D} 4
{A,B} 4
{A,C} 2
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3
Itemset Support
{A,B,C} 2
{A,B,D} 3
{A,C,D} 2
{B,C,D} 3
{A,B,C,D} 2
28
Maximal vs Closed Itemsets
TID Items
1 ABC
2 ABCD
3 BCE
4 ACDE
5 DE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
124 123 1234 245 345
12 124 24 4 123 2 3 24 34 45
12 2 24 4 4 2 3 4
2 4
Transaction Ids
Not supported by
any transactions
29
Maximal vs Closed Frequent Itemsets
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
124 123 1234 245 345
12 124 24 4 123 2 3 24 34 45
12 2 24 4 4 2 3 4
2 4
Minimum support = 2
# Closed = 9
# Maximal = 4
Closed and
maximal
Closed but
not maximal
30
Maximal vs Closed Itemsets
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
Beyond Itemsets
• Sequence Mining
– Finding frequent subsequences from a collection of sequences
• Graph Mining
– Finding frequent (connected) subgraphs from a collection of
graphs
• Tree Mining
– Finding frequent (embedded) subtrees from a set of
trees/graphs
• Geometric Structure Mining
– Finding frequent substructures from 3-D or 2-D geometric
graphs
• Among others…
Frequent Pattern Mining
B
A
E
A B
C
C
F
B
D
F
F
D
E
A B
A
C
A
E
D
C
F
D
A
B
A
C
E
A
D
A B
D C
A
A B
B
D
D
C
C
A B
D C
Why Frequent Pattern Mining is So
Important?
• Application Domains
– Business, biology, chemistry, WWW, computer/networing security, …
• Summarizing the underlying datasets, providing key insights
• Basic tools for other data mining tasks
– Assocation rule mining
– Classification
– Clustering
– Change Detection
– etc…
Network motifs: recurring patterns that
occur significantly more than in
randomized nets
• Do motifs have specific roles in the network?
• Many possible distinct subgraphs
The 13 three-node connected
subgraphs
199 4-node directed connected subgraphs
And it grows fast for larger subgraphs : 9364 5-node subgraphs,
1,530,843 6-node…
Finding network motifs –
an overview
• Generation of a suitable random ensemble (reference
networks)
• Network motifs detection process:
 Count how many times each subgraph
appears
 Compute statistical significance for each
subgraph – probability of appearing in
random as much as in real network
(P-val or Z-score)
Real = 5 Rand=0.5±0.6
Zscore (#Standard Deviations)=7.5
Ensemble
of networks
39
References
• R. Agrawal, T. Imielinski, and A. Swami. Mining
association rules between sets of items in
large databases. SIGMOD, 207-216, 1993.
• R. Agrawal and R. Srikant. Fast algorithms for
mining association rules. VLDB, 487-499,
1994.
• R. J. Bayardo. Efficiently mining long patterns
from databases. SIGMOD, 85-93, 1998.
References:
• Christian Borgelt, Efficient Implementations of
Apriori and Eclat, FIMI’03
• Ferenc Bodon, A fast APRIORI implementation,
FIMI’03
• Ferenc Bodon, A Survey on Frequent Itemset
Mining, Technical Report, Budapest University
of Technology and Economic, 2006
Important websites:
• FIMI workshop
– Not only Apriori and FIM
• FP-tree, ECLAT, Closed, Maximal
– http://fimi.cs.helsinki.fi/
• Christian Borgelt’s website
– http://www.borgelt.net/software.html
• Ferenc Bodon’s website
– http://www.cs.bme.hu/~bodon/en/apriori/

More Related Content

Similar to Feequent Item Mining - Data Mining - Pattern Mining (20)

PDF
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
cscpconf
 
PPT
My6asso
ketan533
 
PPT
Cs501 mining frequentpatterns
Kamal Singh Lodhi
 
PPT
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
RaviKiranVarma4
 
PDF
06FPBasic02.pdf
Alireza418370
 
PDF
B0950814
IOSR Journals
 
PPT
A vertical representation in frequent item set mining
Dr.Manmohan Singh
 
PPTX
Data mining techniques unit III
malathieswaran29
 
PDF
Dm unit ii r16
Kishore Kumar
 
PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
ijsrd.com
 
PDF
Discovering Frequent Patterns with New Mining Procedure
IOSR Journals
 
PPTX
Association Rule Mining, Correlation,Clustering
RupaRaj6
 
PPTX
Chapter 01 Introduction DM.pptx
ssuser957b41
 
PPTX
Apriori algorithm
DHIVYADEVAKI
 
PPTX
Frequent Itemset Mining (FIM) using aporiori
bobysiswanto1
 
PDF
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
ITIIIndustries
 
PDF
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
PDF
6 module 4
tafosepsdfasg
 
PDF
Literature Survey of modern frequent item set mining methods
ijsrd.com
 
PPSX
Frequent itemset mining methods
Prof.Nilesh Magar
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
cscpconf
 
My6asso
ketan533
 
Cs501 mining frequentpatterns
Kamal Singh Lodhi
 
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
RaviKiranVarma4
 
06FPBasic02.pdf
Alireza418370
 
B0950814
IOSR Journals
 
A vertical representation in frequent item set mining
Dr.Manmohan Singh
 
Data mining techniques unit III
malathieswaran29
 
Dm unit ii r16
Kishore Kumar
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
ijsrd.com
 
Discovering Frequent Patterns with New Mining Procedure
IOSR Journals
 
Association Rule Mining, Correlation,Clustering
RupaRaj6
 
Chapter 01 Introduction DM.pptx
ssuser957b41
 
Apriori algorithm
DHIVYADEVAKI
 
Frequent Itemset Mining (FIM) using aporiori
bobysiswanto1
 
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
ITIIIndustries
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
6 module 4
tafosepsdfasg
 
Literature Survey of modern frequent item set mining methods
ijsrd.com
 
Frequent itemset mining methods
Prof.Nilesh Magar
 

More from Jason J Pulikkottil (20)

PDF
Unix/Linux Command Reference - File Commands and Shortcuts
Jason J Pulikkottil
 
PDF
Introduction to PERL Programming - Complete Notes
Jason J Pulikkottil
 
PDF
VLSI System Verilog Notes with Coding Examples
Jason J Pulikkottil
 
PDF
VLSI Physical Design Physical Design Concepts
Jason J Pulikkottil
 
PDF
Verilog Coding examples of Digital Circuits
Jason J Pulikkottil
 
PDF
Floor Plan, Placement Questions and Answers
Jason J Pulikkottil
 
PDF
Physical Design, ASIC Design, Standard Cells
Jason J Pulikkottil
 
PDF
Basic Electronics, Digital Electronics, Static Timing Analysis Notes
Jason J Pulikkottil
 
PDF
Floorplan, Powerplan and Data Setup, Stages
Jason J Pulikkottil
 
PDF
Floorplanning Power Planning and Placement
Jason J Pulikkottil
 
PDF
Digital Electronics Questions and Answers
Jason J Pulikkottil
 
PDF
Different Types Of Cells, Types of Standard Cells
Jason J Pulikkottil
 
PDF
DFT Rules, set of rules with illustration
Jason J Pulikkottil
 
PDF
Clock Definitions Static Timing Analysis for VLSI Engineers
Jason J Pulikkottil
 
PDF
Basic Synthesis Flow and Commands, Logic Synthesis
Jason J Pulikkottil
 
PDF
ASIC Design Types, Logical Libraries, Optimization
Jason J Pulikkottil
 
PDF
Floorplanning and Powerplanning - Definitions and Notes
Jason J Pulikkottil
 
PDF
Physical Design Flow - Standard Cells and Special Cells
Jason J Pulikkottil
 
PDF
Physical Design - Import Design Flow Floorplan
Jason J Pulikkottil
 
PDF
Physical Design-Floor Planning Goals And Placement
Jason J Pulikkottil
 
Unix/Linux Command Reference - File Commands and Shortcuts
Jason J Pulikkottil
 
Introduction to PERL Programming - Complete Notes
Jason J Pulikkottil
 
VLSI System Verilog Notes with Coding Examples
Jason J Pulikkottil
 
VLSI Physical Design Physical Design Concepts
Jason J Pulikkottil
 
Verilog Coding examples of Digital Circuits
Jason J Pulikkottil
 
Floor Plan, Placement Questions and Answers
Jason J Pulikkottil
 
Physical Design, ASIC Design, Standard Cells
Jason J Pulikkottil
 
Basic Electronics, Digital Electronics, Static Timing Analysis Notes
Jason J Pulikkottil
 
Floorplan, Powerplan and Data Setup, Stages
Jason J Pulikkottil
 
Floorplanning Power Planning and Placement
Jason J Pulikkottil
 
Digital Electronics Questions and Answers
Jason J Pulikkottil
 
Different Types Of Cells, Types of Standard Cells
Jason J Pulikkottil
 
DFT Rules, set of rules with illustration
Jason J Pulikkottil
 
Clock Definitions Static Timing Analysis for VLSI Engineers
Jason J Pulikkottil
 
Basic Synthesis Flow and Commands, Logic Synthesis
Jason J Pulikkottil
 
ASIC Design Types, Logical Libraries, Optimization
Jason J Pulikkottil
 
Floorplanning and Powerplanning - Definitions and Notes
Jason J Pulikkottil
 
Physical Design Flow - Standard Cells and Special Cells
Jason J Pulikkottil
 
Physical Design - Import Design Flow Floorplan
Jason J Pulikkottil
 
Physical Design-Floor Planning Goals And Placement
Jason J Pulikkottil
 
Ad

Recently uploaded (20)

PPTX
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
PDF
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
PPTX
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays
 
PPTX
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PDF
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
apidays
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PPTX
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays
 
PDF
Dr. Robert Krug - Chief Data Scientist At DataInnovate Solutions
Dr. Robert Krug
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
PPTX
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
PDF
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
PPTX
Learning Tendency Analysis of Scratch Programming Course(Entry Class) for Upp...
ryouta039
 
PPTX
materials that are required to used.pptx
drkaran1421
 
PPTX
Green Vintage Notebook Science Subject for Middle School Climate and Weather ...
RiddhimaVarshney1
 
PDF
The X-Press God-WPS Office.pdf hdhdhdhdhd
ramifatoh4
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PDF
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
PPTX
Presentation1.pptx4327r58465824358432884
udayfand0306
 
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays
 
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
Climate Action.pptx action plan for climate
justfortalabat
 
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
apidays
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays
 
Dr. Robert Krug - Chief Data Scientist At DataInnovate Solutions
Dr. Robert Krug
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
Learning Tendency Analysis of Scratch Programming Course(Entry Class) for Upp...
ryouta039
 
materials that are required to used.pptx
drkaran1421
 
Green Vintage Notebook Science Subject for Middle School Climate and Weather ...
RiddhimaVarshney1
 
The X-Press God-WPS Office.pdf hdhdhdhdhd
ramifatoh4
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
Presentation1.pptx4327r58465824358432884
udayfand0306
 
Ad

Feequent Item Mining - Data Mining - Pattern Mining

  • 2. What is data mining? • Pattern Mining • What patterns • Why are they useful
  • 3. 3 Definition: Frequent Itemset • Itemset – A collection of one or more items • Example: {Milk, Bread, Diaper} – k-itemset • An itemset that contains k items • Support count () – Frequency of occurrence of an itemset – E.g. ({Milk, Bread,Diaper}) = 2 • Support – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 • Frequent Itemset – An itemset whose support is greater than or equal to a minsup threshold TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  • 4. Frequent Itemsets Mining TID Transactions 100 { A, B, E } 200 { B, D } 300 { A, B, E } 400 { A, C } 500 { B, C } 600 { A, C } 700 { A, B } 800 { A, B, C, E } 900 { A, B, C } 1000 { A, C, E } • Minimum support level 50% – {A},{B},{C},{A,B}, {A,C} • How to link this to Data Cube?
  • 5. Three Different Views of FIM • Transactional Database – How we do store a transactional database? • Horizontal, Vertical, Transaction-Item Pair • Binary Matrix • Bipartite Graph • How does the FIM formulated in these different settings? 5 TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  • 6. 6 Frequent Itemset Generation null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2d possible candidate itemsets
  • 7. 7 Frequent Itemset Generation • Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset – Count the support of each candidate by scanning the database – Match each transaction against every candidate – Complexity ~ O(NMw) => Expensive since M = 2d !!! TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke N Transactions List of Candidates M w
  • 8. 8 Reducing Number of Candidates • Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent • Apriori principle holds due to the following property of the support measure: – Support of an itemset never exceeds the support of its subsets – This is known as the anti-monotone property of support ) ( ) ( ) ( : , Y s X s Y X Y X    
  • 9. 9 Illustrating Apriori Principle Found to be Infrequent null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Pruned supersets
  • 10. 10 Illustrating Apriori Principle Item Count Bread 4 Coke 2 Milk 4 Beer 3 Diaper 4 Eggs 1 Itemset Count {Bread,Milk} 3 {Bread,Beer} 2 {Bread,Diaper} 3 {Milk,Beer} 2 {Milk,Diaper} 3 {Beer,Diaper} 3 Itemset Count {Bread,Milk,Diaper} 3 Items (1-itemsets) Pairs (2-itemsets) (No need to generate candidates involving Coke or Eggs) Triplets (3-itemsets) Minimum Support = 3 If every subset is considered, 6C1 + 6C2 + 6C3 = 41 With support-based pruning, 6 + 6 + 1 = 13
  • 11. Apriori R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB, 487-499, 1994
  • 13. 13 How to Generate Candidates? • Suppose the items in Lk-1 are listed in an order • Step 1: self-joining Lk-1 insert into Ck select p.item1, p.item2, …, p.itemk-1, q.itemk-1 from Lk-1 p, Lk-1 q where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1 • Step 2: pruning forall itemsets c in Ck do forall (k-1)-subsets s of c do if (s is not in Lk-1) then delete c from Ck
  • 14. 14 Challenges of Frequent Itemset Mining • Challenges – Multiple scans of transaction database – Huge number of candidates – Tedious workload of support counting for candidates • Improving Apriori: general ideas – Reduce passes of transaction database scans – Shrink number of candidates – Facilitate support counting of candidates
  • 15. 15 Alternative Methods for Frequent Itemset Generation • Representation of Database – horizontal vs vertical data layout TID Items 1 A,B,E 2 B,C,D 3 C,E 4 A,C,D 5 A,B,C,D 6 A,E 7 A,B 8 A,B,C 9 A,C,D 10 B Horizontal Data Layout A B C D E 1 1 2 2 1 4 2 3 4 3 5 5 4 5 6 6 7 8 9 7 8 9 8 10 9 Vertical Data Layout
  • 16. 16 ECLAT • For each item, store a list of transaction ids (tids) TID Items 1 A,B,E 2 B,C,D 3 C,E 4 A,C,D 5 A,B,C,D 6 A,E 7 A,B 8 A,B,C 9 A,C,D 10 B Horizontal Data Layout A B C D E 1 1 2 2 1 4 2 3 4 3 5 5 4 5 6 6 7 8 9 7 8 9 8 10 9 Vertical Data Layout TID-list
  • 17. 17 ECLAT • Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets. • 3 traversal approaches: – top-down, bottom-up and hybrid • Advantage: very fast support counting • Disadvantage: intermediate tid-lists may become too large for memory A 1 4 5 6 7 8 9 B 1 2 5 7 8 10   AB 1 5 7 8
  • 20. 20 FP-growth Algorithm • Use a compressed representation of the database using an FP-tree • Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets
  • 21. 21 FP-tree construction TID Items 1 {A,B} 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} 7 {B,C} 8 {A,B,C} 9 {A,B,D} 10 {B,C,E} null A:1 B:1 null A:1 B:1 B:1 C:1 D:1 After reading TID=1: After reading TID=2:
  • 22. 22 FP-Tree Construction null A:7 B:5 B:3 C:3 D:1 C:1 D:1 C:3 D:1 D:1 E:1 E:1 TID Items 1 {A,B} 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} 7 {B,C} 8 {A,B,C} 9 {A,B,D} 10 {B,C,E} Pointers are used to assist frequent itemset generation D:1 E:1 Transaction Database Item Pointer A B C D E Header table
  • 23. 23 FP-growth null A:7 B:5 B:1 C:1 D:1 C:1 D:1 C:3 D:1 D:1 Conditional Pattern base for D: P = {(A:1,B:1,C:1), (A:1,B:1), (A:1,C:1), (A:1), (B:1,C:1)} Recursively apply FP- growth on P Frequent Itemsets found (with sup > 1): AD, BD, CD, ACD, BCD D:1
  • 25. 25 Compact Representation of Frequent Itemsets • Some itemsets are redundant because they have identical support as their supersets • Number of frequent itemsets • Need a compact representation TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1           10 1 10 3 k k
  • 26. 26 Maximal Frequent Itemset null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCD E Border Infrequent Itemsets Maximal Itemsets An itemset is maximal frequent if none of its immediate supersets is frequent
  • 27. 27 Closed Itemset • An itemset is closed if none of its immediate supersets has the same support as the itemset TID Items 1 {A,B} 2 {B,C,D} 3 {A,B,C,D} 4 {A,B,D} 5 {A,B,C,D} Itemset Support {A} 4 {B} 5 {C} 3 {D} 4 {A,B} 4 {A,C} 2 {A,D} 3 {B,C} 3 {B,D} 4 {C,D} 3 Itemset Support {A,B,C} 2 {A,B,D} 3 {A,C,D} 2 {B,C,D} 3 {A,B,C,D} 2
  • 28. 28 Maximal vs Closed Itemsets TID Items 1 ABC 2 ABCD 3 BCE 4 ACDE 5 DE null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE 124 123 1234 245 345 12 124 24 4 123 2 3 24 34 45 12 2 24 4 4 2 3 4 2 4 Transaction Ids Not supported by any transactions
  • 29. 29 Maximal vs Closed Frequent Itemsets null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE 124 123 1234 245 345 12 124 24 4 123 2 3 24 34 45 12 2 24 4 4 2 3 4 2 4 Minimum support = 2 # Closed = 9 # Maximal = 4 Closed and maximal Closed but not maximal
  • 30. 30 Maximal vs Closed Itemsets Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets
  • 31. Beyond Itemsets • Sequence Mining – Finding frequent subsequences from a collection of sequences • Graph Mining – Finding frequent (connected) subgraphs from a collection of graphs • Tree Mining – Finding frequent (embedded) subtrees from a set of trees/graphs • Geometric Structure Mining – Finding frequent substructures from 3-D or 2-D geometric graphs • Among others…
  • 32. Frequent Pattern Mining B A E A B C C F B D F F D E A B A C A E D C F D A B A C E A D A B D C A A B B D D C C A B D C
  • 33. Why Frequent Pattern Mining is So Important? • Application Domains – Business, biology, chemistry, WWW, computer/networing security, … • Summarizing the underlying datasets, providing key insights • Basic tools for other data mining tasks – Assocation rule mining – Classification – Clustering – Change Detection – etc…
  • 34. Network motifs: recurring patterns that occur significantly more than in randomized nets • Do motifs have specific roles in the network? • Many possible distinct subgraphs
  • 35. The 13 three-node connected subgraphs
  • 36. 199 4-node directed connected subgraphs And it grows fast for larger subgraphs : 9364 5-node subgraphs, 1,530,843 6-node…
  • 37. Finding network motifs – an overview • Generation of a suitable random ensemble (reference networks) • Network motifs detection process:  Count how many times each subgraph appears  Compute statistical significance for each subgraph – probability of appearing in random as much as in real network (P-val or Z-score)
  • 38. Real = 5 Rand=0.5±0.6 Zscore (#Standard Deviations)=7.5 Ensemble of networks
  • 39. 39 References • R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD, 207-216, 1993. • R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB, 487-499, 1994. • R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD, 85-93, 1998.
  • 40. References: • Christian Borgelt, Efficient Implementations of Apriori and Eclat, FIMI’03 • Ferenc Bodon, A fast APRIORI implementation, FIMI’03 • Ferenc Bodon, A Survey on Frequent Itemset Mining, Technical Report, Budapest University of Technology and Economic, 2006
  • 41. Important websites: • FIMI workshop – Not only Apriori and FIM • FP-tree, ECLAT, Closed, Maximal – http://fimi.cs.helsinki.fi/ • Christian Borgelt’s website – http://www.borgelt.net/software.html • Ferenc Bodon’s website – http://www.cs.bme.hu/~bodon/en/apriori/