0% found this document useful (0 votes)
74 views19 pages

AzqaSaleemKhan (SP22 RCS 003) FPGrowth

The FP-Growth algorithm is an improvement on the Apriori algorithm for frequent pattern mining. It avoids candidate generation and instead constructs a frequent-pattern tree (FP-tree) to store transaction data, compressing the database. Frequent patterns are generated by traversing the FP-tree without candidate generation. The algorithm scans the database to determine frequent items, constructs the FP-tree, and then mines the tree to find frequent patterns.

Uploaded by

Azqa Saleem Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views19 pages

AzqaSaleemKhan (SP22 RCS 003) FPGrowth

The FP-Growth algorithm is an improvement on the Apriori algorithm for frequent pattern mining. It avoids candidate generation and instead constructs a frequent-pattern tree (FP-tree) to store transaction data, compressing the database. Frequent patterns are generated by traversing the FP-tree without candidate generation. The algorithm scans the database to determine frequent items, constructs the FP-tree, and then mines the tree to find frequent patterns.

Uploaded by

Azqa Saleem Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

FP-Growth Algorithm

(Frequent Pattern Growth Algorithm)

Azqa Saleem Khan


(SP22-RCS-003)
Department of Computer Science
Advanced Algorithm Analysis
Dr. Nadeem Javaid

(Date: 30/05/2022)

COMSATS University Islamabad, Islamabad Pakistan


Preliminaries (1/3)
 Artificial Intelligence: Artificial intelligence (AI) refers to the simulation of human intelligence in
machines that are programmed to think like humans and mimic their actions.

 Machine Learning: Machine learning is a branch of artificial intelligence (AI) and computer science
which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its
accuracy.[1]

 Unsupervised Learning: Unsupervised Learning is a machine learning technique in which the users do
not need to supervise the model. Instead, it allows the model to work on its own to discover patterns and
information that was previously undetected. It mainly deals with the unlabeled data. [2]

 Unsupervised Learning Algorithms: Unsupervised machine learning algorithms are used when the
information used to train is neither classified nor labeled. It studies how systems can infer a function to
describe a hidden structure from unlabeled data. At no point does the system know the correct output with
certainty. Instead, it draws inferences from datasets as to what the output should be. [2]

1
Preliminaries (2/3)
Frequent itemset: Frequent itemset are those items whose support is greater than the threshold value or
user-specified minimum support. It means if A & B are the frequent itemset together, then individually A and
B should also be the frequent itemset.
 Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two transactions, 2 and 3 are
the frequent itemset.[3]

Association Rule : [3]


 Association rule is a type of unsupervised learning technique that checks for the dependency of one data item
on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting
relations or associations among the variables of dataset. It is based on different rules to discover the
interesting relations between variables in the database.
 The association rule learning is one of the very important concepts of machine learning, and it is employed
in Market Basket analysis, Web usage mining, continuous production, etc.
 An implication expression of the form X → Y, where X and Y are any 2 itemsets.

2
Preliminaries (3/3)
Apriori Algorithm: Apriori algorithm finds the most frequent itemsets or elements in a transaction
database and identifies association rules between the items. It uses “join” and “prune” to reduce the search
space. It is an iterative approach to discover the most frequent itemsets.
 The two primary drawbacks of the Apriori Algorithm are: 
1. At each step, candidate sets have to be built.
2. To build the candidate sets, the algorithm has to repeatedly scan the database.
 These two properties inevitably make the algorithm slower. To overcome these redundant steps, a new
association-rule mining algorithm was developed named Frequent Pattern Growth Algorithm. [3]

3
FP-Growth- Introduction
 FP-Growth Algorithm was introduced by Han, Pei and Yin in 2000 to eliminate the candidate generation of
Apriori Algorithm.
 This algorithm is an improvement to the Apriori method.
 A frequent pattern is generated without the need for candidate generation. FP growth algorithm represents the
database in the form of a tree called a frequent pattern tree or FP tree.
 This tree structure will maintain the association between the itemsets. The database is fragmented using one
frequent item. This fragmented part is called “pattern fragment”.
 Apriori is a Join-Based algorithm and FP-Growth is Tree-Based algorithm for frequent itemset mining or
frequent pattern mining for market basket analysis. 
 A FP-tree is a compact data structure that represents the data set in tree form.  Each transaction is read and
then mapped onto a path in the FP-tree. This is done until all transactions have been read. Different
transactions that have common subsets allow the tree to remain compact because their paths overlap.

4
Flowchart
1. Scan the data set to determine the support count of
each item, discard the infrequent items and sort the
frequent items in decreasing order.
2. Scan the data set one transaction at a time to create
the FP-tree. For each transaction:
a. If it is a unique transaction form a new path
and set the counter for each node to 1.
b. If it shares a common prefix itemset then
increment the common itemset node counters
and create new nodes if needed.
3. Continue this until each transaction has been
mapped unto the tree.

Source: https://arxiv.org/ftp/arxiv/papers/1901/1901.11376.pdf 5
FP-Growth Algorithm

FPGrowth(FPTree, a, support) {

• for each item ai in the header of FPTree {


• generate β = ai ∪ FPTree with support = ai.support
• construct β conditional pattern base and
• conditional FP-Tree (Tree β)
• if Tree β != null
• FP-Growth (FPTree β, β)
• }
• return frequent_patterns(FPTree)
}

6
Source: https://arxiv.org/ftp/arxiv/papers/1901/1901.11376.pdf/
FP-Growth- Example
The given data is a hypothetical dataset of transactions with each letter representing an item.

min_support = 3.

TID Items Bought


T1 f,a,c,d,g,i,m,p

T2 a,b,c,f,l,m,o

T3 b,f,h,j,o

T4 b,c,k,s,p

T5 a,f,c,e,l,p,m,n

TABLE-1

Source: https://www.vtupulse.com/big-data-analytics/frequent-pattern-fp-growth-algorithm-example/ 7
We find the frequency of each item.
The following table gives the frequency of each item in the given data.
This is the count of each item, such as if we see item C has been bought in 4 transactions in, T1, T2, T4, &
T5, so the support count is 4 for C.

Item Frequency Item Frequency


a 3 j 1
b 3 k 1
c 4 l 2
d 1 m 3
e 1 n 1
f 4 o 2
g 1 p 3
h 1 s 1
i 1

TABLE-2
8
A Frequent Pattern set (L) is built, which will contain all the elements whose frequency is greater than
or equal to the minimum support.
These elements are stored in descending order of their respective frequencies.
As minimum support is 3.
After insertion of the relevant items, the set L looks like this:-
L = { (f:4), (c:4), (a:3), (b:3), (m:3), (p:3) }

Now, for each transaction, the respective Ordered-Item set is built.

TID Items Bought (Ordered) Frequent Items

T1 f,a,c,d,g,i,m,p f,c,a,m,p

T2 a,b,c,f,l,m,o f,c,a,b,m

T3 b,f,h,j,o f,b

T4 b,c,k,s,p c,b,p

T5 a,f,c,e,l,p,m,n f,c,a,m,p

TABLE-3
9
 Now, all the Ordered-Item sets are to be inserted into a Trie Data
Structure (frequent pattern tree).

Create Root Transaction 1

10
Transaction 2 Transaction 3

11
Transaction 4 Transaction 5

12
 For each item, the Conditional Pattern Base is computed which is the path labels of all the paths
which lead to any node of the given item in the frequent-pattern tree.

Item Conditional Pattern Base


p {{f,c,a,m:2}, {c,b:1}}

m {{f,c,a:2},{f,c,a,b:1}}

b {{f,c,a:1},{f:1},{c:1}}

a {{f,c:3}}

c {{f:3}}

f Ø

TABLE-4

13
 Now the Conditional Frequent Pattern Tree is built.
It is done by taking the set of elements that is common in all the paths in the Conditional Pattern
Base of that item and calculating its support count by summing the support counts of all the paths
in the Conditional Pattern Base.

Item Conditional Pattern Base Conditional FP-Tree


p {{f,c,a,m:2}, {c,b:1}} {c:3}

m {{f,c,a:2},{f,c,a,b:1}} {f,c,a:3}

b {{f,c,a:1},{f:1},{c:1}} Ø

a {{f,c:3}} {f,c:3}

c {{f:3}} {f:3}

f Ø Ø

TABLE-5

14
 Next, the Frequent Pattern rules are generated by pairing the items of the Conditional Frequent
Pattern Tree set to the corresponding item.

Item Conditional Pattern Conditional FP-Tree Frequent Pattern Generated


Base
p {{f,c,a,m:2}, {c,b:1}} {c:3} {<c,p:3>}
{<f,m:3>, <c,m:3>, <a,m:3>,
m {{f,c,a:2},{f,c,a,b:1}} {f,c,a:3} <f,c,m:3>, <f,a,m:3>, <c,a,m:3>}
b {{f,c,a:1},{f:1},{c:1}} Ø {}

a {{f,c:3}} {f,c:3} {<f,a:3>, <c,a:3>, <f,c,a:3>}

c {{f:3}} {f:3} {<f,c:3>}

f Ø Ø {}

TABLE-6
15
FP-Growth
 For each row, two types of association rules can be inferred.
For example for the first row which contains the element, the rules K→Y and Y→K can be inferred.

 Advantages Of FP Growth Algorithm


1. This algorithm needs to scan the database only twice when compared to Apriori which scans the
transactions for each iteration.
2. The pairing of items is not done in this algorithm and this makes it faster.
3. The database is stored in a compact version in memory.
4. It is efficient and scalable for mining both long and short frequent patterns.

 Disadvantages Of FP-Growth Algorithm


1. FP Tree is more cumbersome and difficult to build than Apriori.
2. It may be expensive.
3. When the database is large, the algorithm may not fit in the shared memory.

16
Source: https://www.vtupulse.com/big-data-analytics/frequent-pattern-fp-growth-algorithm-example/
References
1. https://www.ibm.com/cloud/learn/machine-learning#:~:text=Machine%20learni
ng%20is%20a%20branch,rich%20history%20with%20machine%20learning.
2. https://www.guru99.com/unsupervised-machine-learning.html/
3. https://www.javatpoint.com/apriori-algorithm-in-machine-learning
4. https://arxiv.org/ftp/arxiv/papers/1901/1901.11376.pdf/
5. https://www.vtupulse.com/big-data-analytics/frequent-pattern-fp-growth-algorit
hm-example/
6. https://www.geeksforgeeks.org/ml-frequent-pattern-growth-algorithm/

17
Thank You!!
T

You might also like