SlideShare a Scribd company logo
Sequential Pattern Mining
University of Kashan
Fall 2017
Hamidreza Mahdavipanah
Pegah Hajian
Narges Heydarzadeh
Professor: Dr. S. M. Vahidipour
Outlines
● Introduction
● Definitions
● GSP
● Constraints that GSP supports
Introduction
Studies on Sequential Pattern Mining
● First introduced by Agrawal and Srikant in 1995
○ They presented three algorithms
■ AprioriAll
■ AprioriSome
■ DynamicSome
● Then in 1996 they presented GSP algorithm which was much faster than
former algorithms and it also was generalized for more real life problems
● Pattern-growth methods: FreeSpan and PrefixSpan
● Mining closed sequential patterns: CloSpan
● ...
Applications
● Customer shopping sequences
○ First buy computer, then CD-ROM, and then digital camera, within 3
month.
● Medical treatments
● Natural disasters (e.g., earthquakes)
● Stocks and markets
● DNA sequences and gene structures
Problem Statement
We are given a database D of customer transactions. Each transaction consists of
the following fields:
We want to find all large sequences that have a certain user-specified minimum
support.
It is similar to the frequent itemsets mining, but with consideration of ordering.
customer-id transaction-time items
Example of a
database
Customer Id Transaction Time Items Bought
1 June 25 ‘93 30
1 June 30 ‘93 90
2 June 10 ‘93 10, 20
2 June 15 ‘93 30
2 June 20 ‘93 40, 60, 70
3 June 25 ‘93 30, 50, 70
4 June 25 ‘93 30
4 June 30 ‘93 40, 70
4 June 25 ‘93 90
5 June 12 ‘93 90
We convert DB
to this form
Customer Id Customer Sequence
1 <(30)(90)>
2 <(10 20) (30) (40 60 70)>
3 <(30 50 70)>
4 <(30) (40 70) (90)>
5 <(90)>
Definitions
Itemset and Sequence
An itemset is a non-empty set of items.
- We denote an itemset i by (i1
i2
… im
)
A Sequence is an ordered list of items.
- We denote a sequence s by <s1
s2
… sn
>
Subsequence and supersequence
Given two sequences α = <a1
a2
… an
> and β = <b1
b2
… bm
>:
● α is called a subsequence of β, if there exists integers 1 ≤ j1
< j2
< … < jn
≤ m
such that a1
⊆ b1
, a2
⊆ b2
, …, an
⊆ bjn
● β is called a supersequence of α
Example:
α=<(a b) d> and β=<(a b c) (d e)>
Apriori Property
of Sequential
Patterns
If a sequence S is not
frequent, then none of the
super-sequences of S is
frequent.
Example:
<h b> is infrequent -> so do
<h a b> and <(a h) b>
GSP
Outline of the GSP Method
● Initially, every item in DB is a candidate of length-1
● For each level (i.e., sequences of length-k) do
○ Scan database to collect support count for each candidate sequence
○ Generate candidate length(k + 1) sequences from length-k frequent
sequences using Apriori
○ Repeat until no frequent sequence or no candidate can be found
Major strength of GSP is its
candidate pruning by Apriori
property
Finding Length-1 Sequential Patterns
● Initial candidates:
○ <a>, <b>, <c>, <d>, <e>, <f>, <g>, <h>
● Scan database once, count support for candidates
Generating Length-2 Candidates
= 36
Generating Length-2 Candidates
= 15
Generating Length-2 Candidates
Using Apriori : 36 + 15 = 51 length-2 candidates
Without Apriori : (8 * 8) + (8 * 7) / 2 = 92 length- candidates
Apriori prunes 44.57% candidates
Finding Length-2 Sequential Patterns
● Scan database one more time, collect support count for
each length-2 candidate
● There are 19 length-2 candidates which pass the
minimum support threshold
○ They are length-2 sequential patterns
The GSP Mining Process
The GSP Algorithm
● Take sequences in form of <x> as length-1 candidates
● Scan database once, find F1
, the set of length-1 sequential
patterns
● Let k = 1; while Fk
is not empty do
○ Form Ck + 1
the set of length-(k + 1) candidates from Fk
○ If Ck + 1
is not empty, scan database once, find Fk + 1
, the
set of length(k + 1) sequential patterns
○ Let k = k + 1
The Good, the Bad, and the Ugly
● The Good: benefits from the Apriori pruning which reduces
search space
● The Bad: Scans the database multiple times
● The Ugly: Generates a huge set of candidates sequences
Why GSP is called Generalized Sequential Pattern Mining?
For practical use of SPM, in 1996 Agrawal and Srikant
introduced three type of constraints that makes SPM problem
more general and practical and since GSP support these
constraints it is called Generalized sequential pattern mining.
Constraints that GSP Supports
Time Constraint
An ability for users to specify maximum and/or minimum
time gaps between adjacent elements of the sequential
pattern.
Sliding Window
That is, each element of the pattern can be contained in the
union of the items bought in a set of transactions, as long as
the difference between the maximum and minimum
transaction-times is less than the size of a sliding time
window.
Taxonomy
An ability to define a taxonomy (is-a hierarchy) over the items
in the data.
References
● R. Agrawal and R. Srikant. Mining Sequential Patterns.1995
● R. Srikant and R. Agrawal. Mining Sequential Patterns.1996
● Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, PrefixSpan: Mining
Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. 2001

More Related Content

What's hot (20)

PPTX
Classification techniques in data mining
Kamal Acharya
 
PPTX
Classification in data mining
Sulman Ahmed
 
PPTX
Machine Learning - Ensemble Methods
Andrew Ferlitsch
 
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
PDF
Feature selection
Dong Guo
 
PPTX
Naive bayes
Ashraf Uddin
 
PPTX
Apriori algorithm
Gaurav Aggarwal
 
PPT
Data Preprocessing
Object-Frontier Software Pvt. Ltd
 
PPT
Clustering: Large Databases in data mining
ZHAO Sam
 
PDF
Representation learning on graphs
Deakin University
 
PPTX
Introduction to Clustering algorithm
hadifar
 
PPTX
Data mining: Classification and prediction
DataminingTools Inc
 
PPTX
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
PDF
bag-of-words models
Xiaotao Zou
 
PPT
Support Vector Machines
nextlib
 
PPTX
Density based methods
SVijaylakshmi
 
PPTX
Text clustering
KU Leuven
 
PPTX
secure file storage on cloud using hybrid Cryptography ppt.pptx
NishmithaHc
 
PPTX
Ensemble learning
Haris Jamil
 
PPTX
Machine learning clustering
CosmoAIMS Bassett
 
Classification techniques in data mining
Kamal Acharya
 
Classification in data mining
Sulman Ahmed
 
Machine Learning - Ensemble Methods
Andrew Ferlitsch
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
Feature selection
Dong Guo
 
Naive bayes
Ashraf Uddin
 
Apriori algorithm
Gaurav Aggarwal
 
Clustering: Large Databases in data mining
ZHAO Sam
 
Representation learning on graphs
Deakin University
 
Introduction to Clustering algorithm
hadifar
 
Data mining: Classification and prediction
DataminingTools Inc
 
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
bag-of-words models
Xiaotao Zou
 
Support Vector Machines
nextlib
 
Density based methods
SVijaylakshmi
 
Text clustering
KU Leuven
 
secure file storage on cloud using hybrid Cryptography ppt.pptx
NishmithaHc
 
Ensemble learning
Haris Jamil
 
Machine learning clustering
CosmoAIMS Bassett
 

Similar to Sequential Pattern Mining and GSP (20)

PDF
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
jasminealisha635
 
PDF
lecture13.pdfhejejejejekkeejejejejejejejej
jasminealisha635
 
PDF
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
PDF
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
PDF
Sequential Pattern Mining Methods: A Snap Shot
IOSR Journals
 
PDF
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
PPTX
data_mining.pptx
PriyankaManna8
 
PDF
Ijsrdv1 i2039
ijsrd.com
 
PPTX
4 sequential pattern mining
Vishal Dutt
 
PDF
Sequential Pattern Tree Mining
IOSR Journals
 
PDF
A Survey of Sequential Rule Mining Techniques
ijsrd.com
 
PDF
pattern mining
Shaina Raza
 
PDF
H0964752
IOSR Journals
 
PDF
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
IJCSEA Journal
 
PDF
Fast Sequential Rule Mining
ijsrd.com
 
PDF
Mining closed sequential patterns in large sequence databases
IJDMS
 
PDF
Classification with Single Constraint Progressive Mining of Sequential Patterns
IJECEIAES
 
PDF
A novel algorithm for mining closed sequential patterns
IJDKP
 
PDF
50120140503013
IAEME Publication
 
PDF
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
AshishDPatel1
 
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
jasminealisha635
 
lecture13.pdfhejejejejekkeejejejejejejejej
jasminealisha635
 
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
Sequential Pattern Mining Methods: A Snap Shot
IOSR Journals
 
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
data_mining.pptx
PriyankaManna8
 
Ijsrdv1 i2039
ijsrd.com
 
4 sequential pattern mining
Vishal Dutt
 
Sequential Pattern Tree Mining
IOSR Journals
 
A Survey of Sequential Rule Mining Techniques
ijsrd.com
 
pattern mining
Shaina Raza
 
H0964752
IOSR Journals
 
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
IJCSEA Journal
 
Fast Sequential Rule Mining
ijsrd.com
 
Mining closed sequential patterns in large sequence databases
IJDMS
 
Classification with Single Constraint Progressive Mining of Sequential Patterns
IJECEIAES
 
A novel algorithm for mining closed sequential patterns
IJDKP
 
50120140503013
IAEME Publication
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
AshishDPatel1
 
Ad

Recently uploaded (20)

PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Tally software_Introduction_Presentation
AditiBansal54083
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Executive Business Intelligence Dashboards
vandeslie24
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Ad

Sequential Pattern Mining and GSP

  • 1. Sequential Pattern Mining University of Kashan Fall 2017 Hamidreza Mahdavipanah Pegah Hajian Narges Heydarzadeh Professor: Dr. S. M. Vahidipour
  • 2. Outlines ● Introduction ● Definitions ● GSP ● Constraints that GSP supports
  • 4. Studies on Sequential Pattern Mining ● First introduced by Agrawal and Srikant in 1995 ○ They presented three algorithms ■ AprioriAll ■ AprioriSome ■ DynamicSome ● Then in 1996 they presented GSP algorithm which was much faster than former algorithms and it also was generalized for more real life problems ● Pattern-growth methods: FreeSpan and PrefixSpan ● Mining closed sequential patterns: CloSpan ● ...
  • 5. Applications ● Customer shopping sequences ○ First buy computer, then CD-ROM, and then digital camera, within 3 month. ● Medical treatments ● Natural disasters (e.g., earthquakes) ● Stocks and markets ● DNA sequences and gene structures
  • 6. Problem Statement We are given a database D of customer transactions. Each transaction consists of the following fields: We want to find all large sequences that have a certain user-specified minimum support. It is similar to the frequent itemsets mining, but with consideration of ordering. customer-id transaction-time items
  • 7. Example of a database Customer Id Transaction Time Items Bought 1 June 25 ‘93 30 1 June 30 ‘93 90 2 June 10 ‘93 10, 20 2 June 15 ‘93 30 2 June 20 ‘93 40, 60, 70 3 June 25 ‘93 30, 50, 70 4 June 25 ‘93 30 4 June 30 ‘93 40, 70 4 June 25 ‘93 90 5 June 12 ‘93 90
  • 8. We convert DB to this form Customer Id Customer Sequence 1 <(30)(90)> 2 <(10 20) (30) (40 60 70)> 3 <(30 50 70)> 4 <(30) (40 70) (90)> 5 <(90)>
  • 10. Itemset and Sequence An itemset is a non-empty set of items. - We denote an itemset i by (i1 i2 … im ) A Sequence is an ordered list of items. - We denote a sequence s by <s1 s2 … sn >
  • 11. Subsequence and supersequence Given two sequences α = <a1 a2 … an > and β = <b1 b2 … bm >: ● α is called a subsequence of β, if there exists integers 1 ≤ j1 < j2 < … < jn ≤ m such that a1 ⊆ b1 , a2 ⊆ b2 , …, an ⊆ bjn ● β is called a supersequence of α Example: α=<(a b) d> and β=<(a b c) (d e)>
  • 12. Apriori Property of Sequential Patterns If a sequence S is not frequent, then none of the super-sequences of S is frequent. Example: <h b> is infrequent -> so do <h a b> and <(a h) b>
  • 13. GSP
  • 14. Outline of the GSP Method ● Initially, every item in DB is a candidate of length-1 ● For each level (i.e., sequences of length-k) do ○ Scan database to collect support count for each candidate sequence ○ Generate candidate length(k + 1) sequences from length-k frequent sequences using Apriori ○ Repeat until no frequent sequence or no candidate can be found
  • 15. Major strength of GSP is its candidate pruning by Apriori property
  • 16. Finding Length-1 Sequential Patterns ● Initial candidates: ○ <a>, <b>, <c>, <d>, <e>, <f>, <g>, <h> ● Scan database once, count support for candidates
  • 19. Generating Length-2 Candidates Using Apriori : 36 + 15 = 51 length-2 candidates Without Apriori : (8 * 8) + (8 * 7) / 2 = 92 length- candidates Apriori prunes 44.57% candidates
  • 20. Finding Length-2 Sequential Patterns ● Scan database one more time, collect support count for each length-2 candidate ● There are 19 length-2 candidates which pass the minimum support threshold ○ They are length-2 sequential patterns
  • 21. The GSP Mining Process
  • 22. The GSP Algorithm ● Take sequences in form of <x> as length-1 candidates ● Scan database once, find F1 , the set of length-1 sequential patterns ● Let k = 1; while Fk is not empty do ○ Form Ck + 1 the set of length-(k + 1) candidates from Fk ○ If Ck + 1 is not empty, scan database once, find Fk + 1 , the set of length(k + 1) sequential patterns ○ Let k = k + 1
  • 23. The Good, the Bad, and the Ugly ● The Good: benefits from the Apriori pruning which reduces search space ● The Bad: Scans the database multiple times ● The Ugly: Generates a huge set of candidates sequences
  • 24. Why GSP is called Generalized Sequential Pattern Mining? For practical use of SPM, in 1996 Agrawal and Srikant introduced three type of constraints that makes SPM problem more general and practical and since GSP support these constraints it is called Generalized sequential pattern mining.
  • 26. Time Constraint An ability for users to specify maximum and/or minimum time gaps between adjacent elements of the sequential pattern.
  • 27. Sliding Window That is, each element of the pattern can be contained in the union of the items bought in a set of transactions, as long as the difference between the maximum and minimum transaction-times is less than the size of a sliding time window.
  • 28. Taxonomy An ability to define a taxonomy (is-a hierarchy) over the items in the data.
  • 29. References ● R. Agrawal and R. Srikant. Mining Sequential Patterns.1995 ● R. Srikant and R. Agrawal. Mining Sequential Patterns.1996 ● Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. 2001