Jung Hoon Kim
N5, Room 2239
E-mail: junghoon.kim@kaist.ac.kr

2014.01.07

KAIST Knowledge Service Engineering
Data Mining Lab.

1
Introduction
 Frequent pattern and association rule mining is one of

the few exceptions to emerge from machine learning
 Apriori algorithm

 AprioriTid algorithm
 AprioriAll algorithm
 FP-Tree algorithm

KAIST Knowledge Service Engineering
Data Mining Lab.

2
Notation


KAIST Knowledge Service Engineering
Data Mining Lab.

3
Principle
 downward closure property.
 If an itemset is frequenct,
then all of its subsets must
also be frequent
 if an itemset is not frequent,
any of its superset is never
frequent

KAIST Knowledge Service Engineering
Data Mining Lab.

4
Apriori algorithm
 Pseudo code

KAIST Knowledge Service Engineering
Data Mining Lab.

5
Example

KAIST Knowledge Service Engineering
Data Mining Lab.

6
Discussion
 Too many database scanning makes high computation

 Need minsup & minconf to be specified in advance.
 Use hash-tree to store the candidate itemsets.

Sometimes it adapt trie-structure to store sets.

KAIST Knowledge Service Engineering
Data Mining Lab.

7
AprioriTid


KAIST Knowledge Service Engineering
Data Mining Lab.

8
AprioriTid

KAIST Knowledge Service Engineering
Data Mining Lab.

9
AprioriTid

KAIST Knowledge Service Engineering
Data Mining Lab.

10
AprioriTid

KAIST Knowledge Service Engineering
Data Mining Lab.

11
FP-Growth
 To avoid scanning multiple database
 the cost of database is too high !!
 To avoid making lots of candidates
 in apriori algorithm, the bottleneck is generation of
candidate
 How can solve these problems?

KAIST Knowledge Service Engineering
Data Mining Lab.

12
FP-Growth
 Algorithm was too simple

1. Scan the database once, find frequent 1-itemsets

(single item patterns)
2. Sort the frequent items in frequency descending
order, f-list(F-list = f-c-a-b-m-p)
3. Scan the DB again, construct the FP-tree
KAIST Knowledge Service Engineering
Data Mining Lab.

13
FP-Growth Algorithm

KAIST Knowledge Service Engineering
Data Mining Lab.

14
FP-Tree
 Scanning the transaction with TID=100

KAIST Knowledge Service Engineering
Data Mining Lab.

15
FP-Tree
 Scanning the transaction with TID=200

KAIST Knowledge Service Engineering
Data Mining Lab.

16
FP-Tree
 Final FP-Tree

KAIST Knowledge Service Engineering
Data Mining Lab.

17
Mine a FP-Tree
forming conditional pattern bases
II. constructing conditional FP-trees
III. recursively mining conditional FP-trees
I.

KAIST Knowledge Service Engineering
Data Mining Lab.

18
Conditional pattern base
 frequent itemset as a co-occurring

suffix pattern
 for example
 m : <f, c, a> : support / 2
 m : <f,c,a,b> : support / 1

KAIST Knowledge Service Engineering
Data Mining Lab.

19
Conditional pattern tree
 {m}’s conditional pattern tree

KAIST Knowledge Service Engineering
Data Mining Lab.

20
Pseudo Code

KAIST Knowledge Service Engineering
Data Mining Lab.

21
Conclusion
 In data mining, association rules are useful for analyzing

and predicting customer behavior. They play an
important part in shopping basket data analysis, product
clustering, catalog design and store layout.

KAIST Knowledge Service Engineering
Data Mining Lab.

22
Thank you

KAIST Knowledge Service Engineering
Data Mining Lab.

23

Apriori algorithm

  • 1.
    Jung Hoon Kim N5,Room 2239 E-mail: [email protected] 2014.01.07 KAIST Knowledge Service Engineering Data Mining Lab. 1
  • 2.
    Introduction  Frequent patternand association rule mining is one of the few exceptions to emerge from machine learning  Apriori algorithm  AprioriTid algorithm  AprioriAll algorithm  FP-Tree algorithm KAIST Knowledge Service Engineering Data Mining Lab. 2
  • 3.
    Notation  KAIST Knowledge ServiceEngineering Data Mining Lab. 3
  • 4.
    Principle  downward closureproperty.  If an itemset is frequenct, then all of its subsets must also be frequent  if an itemset is not frequent, any of its superset is never frequent KAIST Knowledge Service Engineering Data Mining Lab. 4
  • 5.
    Apriori algorithm  Pseudocode KAIST Knowledge Service Engineering Data Mining Lab. 5
  • 6.
    Example KAIST Knowledge ServiceEngineering Data Mining Lab. 6
  • 7.
    Discussion  Too manydatabase scanning makes high computation  Need minsup & minconf to be specified in advance.  Use hash-tree to store the candidate itemsets. Sometimes it adapt trie-structure to store sets. KAIST Knowledge Service Engineering Data Mining Lab. 7
  • 8.
    AprioriTid  KAIST Knowledge ServiceEngineering Data Mining Lab. 8
  • 9.
    AprioriTid KAIST Knowledge ServiceEngineering Data Mining Lab. 9
  • 10.
    AprioriTid KAIST Knowledge ServiceEngineering Data Mining Lab. 10
  • 11.
    AprioriTid KAIST Knowledge ServiceEngineering Data Mining Lab. 11
  • 12.
    FP-Growth  To avoidscanning multiple database  the cost of database is too high !!  To avoid making lots of candidates  in apriori algorithm, the bottleneck is generation of candidate  How can solve these problems? KAIST Knowledge Service Engineering Data Mining Lab. 12
  • 13.
    FP-Growth  Algorithm wastoo simple 1. Scan the database once, find frequent 1-itemsets (single item patterns) 2. Sort the frequent items in frequency descending order, f-list(F-list = f-c-a-b-m-p) 3. Scan the DB again, construct the FP-tree KAIST Knowledge Service Engineering Data Mining Lab. 13
  • 14.
    FP-Growth Algorithm KAIST KnowledgeService Engineering Data Mining Lab. 14
  • 15.
    FP-Tree  Scanning thetransaction with TID=100 KAIST Knowledge Service Engineering Data Mining Lab. 15
  • 16.
    FP-Tree  Scanning thetransaction with TID=200 KAIST Knowledge Service Engineering Data Mining Lab. 16
  • 17.
    FP-Tree  Final FP-Tree KAISTKnowledge Service Engineering Data Mining Lab. 17
  • 18.
    Mine a FP-Tree formingconditional pattern bases II. constructing conditional FP-trees III. recursively mining conditional FP-trees I. KAIST Knowledge Service Engineering Data Mining Lab. 18
  • 19.
    Conditional pattern base frequent itemset as a co-occurring suffix pattern  for example  m : <f, c, a> : support / 2  m : <f,c,a,b> : support / 1 KAIST Knowledge Service Engineering Data Mining Lab. 19
  • 20.
    Conditional pattern tree {m}’s conditional pattern tree KAIST Knowledge Service Engineering Data Mining Lab. 20
  • 21.
    Pseudo Code KAIST KnowledgeService Engineering Data Mining Lab. 21
  • 22.
    Conclusion  In datamining, association rules are useful for analyzing and predicting customer behavior. They play an important part in shopping basket data analysis, product clustering, catalog design and store layout. KAIST Knowledge Service Engineering Data Mining Lab. 22
  • 23.
    Thank you KAIST KnowledgeService Engineering Data Mining Lab. 23