SlideShare a Scribd company logo
IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 2, 2013 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 211
A Survey of Sequential Rule Mining Techniques
Shabana Anwar1
Prof. Abhishek Raghuvanshi2
Abstract— In this paper, we present an overview of existing
sequential rule mining algorithms. All these algorithms are
described more or less on their own. Sequential rule mining
is a very popular and computationally expensive task. We
also explain the fundamentals of sequential rule mining. We
describe today’s approaches for sequential rule mining.
From the broad variety of efficient algorithms that have
been developed we will compare the most important ones.
We will systematize the algorithms and analyze their
performance based on both their run time performance and
theoretical considerations. Their strengths and weaknesses
are also investigated. It turns out that the behavior of the
algorithms is much more similar as to be expected.
I. INTRODUCTION
Data mining is the process of extracting interesting (non-
trivial, implicit, previously unknown and potentially useful)
information or patterns from large information repositories
such as: relational database, data warehouses, XML
repository, etc. Also data mining is known as one of the core
processes of Knowledge Discovery in Database (KDD).
Of all the mining functions in the knowledge discovering
process, frequent pattern mining is to find out the frequently
occurred patterns. The measure of frequent patterns is a
user-specified threshold that indicates the minimum
occurring frequency of the pattern. We may categorize
recent studies in frequent pattern mining into the discovery
of association rules and the discovery of sequential patterns.
Association discovery finds closely correlated sets so that
the presence of some elements in a frequent set will imply
the presence of the remaining elements (in the same set).
Sequential pattern discovery finds temporal associations so
that not only closely correlated sets but also their
relationships in time are uncovered.
In a Sequence Database, each sequence is an time-ordered
list of itemsets. An itemset is an unordered set of items
(symbols), considered to occur simultaneously.
Sequential Pattern Mining is probably the most popular set
of techniques for discovering temporal patterns in sequence
databases. SPM finds subsequences that are common to
more than minsup sequences. SPM is limited for making
predictions. For example, consider the pattern {x},{y}. It is
possible that y appears frequently after an x but that there are
also many cases where x is not followed by y. For
prediction, we need a measurement of the confidence that if
x occurs, y will occur afterward
A sequential rule typically has the form X->Y .A sequential
rule X⇒Y has two properties:
 Support: the number of sequences where X occurs
before Y, divided by the number of sequences.
 Confidence the number of sequences where X
occurs before Y, divided by the number of
sequences where X occurs.
Sequential Rule Mining finds all valid rules, rules with a
support and confidence not less than user-defined thresholds
minSup and minConf
For Example : An example of Sequential Rule Mining is as
follows:
Consider minSup= 0.5 and minConf= 0.5:
Figure1: A sequence database
Figure. 2: some rules found
II. A SURVEY OF SRM METHODS
In general, we may categorize the mining approaches into
the generate-and-test framework and the pattern-growth one,
for sequence databases of horizontal layout. Typifying the
former approaches [1,2 , 3], the GSP (Generalized
Sequential Pattern) algorithm [3] generates potential
patterns (called candidates), scans each data sequence in the
database to compute the frequencies of candidates (called
supports), and then identifies candidates having enough
supports as sequential patterns. The sequential patterns in
current database pass become seeds for generating
candidates in the next pass. This generate-and-test process is
A Survey of Sequential Rule Mining Techniques
(IJSRD/Vol. 1/Issue 2/2013/0038)
All rights reserved by www.ijsrd.com
212
repeated until no more new candidates are generated. When
candidates cannot fit in memory in a batch, GSP re-scans the
database to test the remaining candidates that have not been
loaded into memory. Consequently, GSP scans at least k
times of the on-disk database if the maximum size of the
discovered patterns is k, which incurs high cost of disk
reading. Despite that GSP was good at candidate pruning,
the number of candidates is still very huge that might impair
the mining efficiency.
The PrefixSpan (Prefix-projected Sequential
pattern mining) algorithm [4], representing the pattern-
growth methodology [5, 4, 6], finds the frequent items after
scanning the sequence database once. The database is then
projected, according to the frequent items, into several
smaller databases. Finally, the complete set of sequential
patterns is found by recursively growing subsequence
fragments in each projected database. Two optimizations for
minimizing disk projections were described in [4]. The bi-
level projection technique, dealing with huge databases,
scans each data sequence twice in the (projected) database
so that fewer and smaller projected databases are generated.
The pseudo-projection technique, avoiding physical
projections, maintains the sequence-postfix of each data
sequence in a projection by a pointer-offset pair. However,
according to [4], maximum mining performance can be
achieved only when the database size is reduced to the size
accommodable by the main memory by employing pseudo-
projection after using bi-level optimization. Although
PrefixSpan successfully discovered patterns employing the
divide-and-conquer strategy, the cost of disk I/O might be
high due to the creation and processing of the projected sub-
databases.
Besides the horizontal layout, the sequence
database can be transformed into a vertical format consisting
of items’ id-lists [7, 8, 9]. The id-list of an item is a list of
(sequence-id, timestamp) pairs indicating the occurring
timestamps of the item in that sequence. Searching in the
lattice formed by id-list intersections, the SPADE
(Sequential PAttern Discovery using Equivalence classes)
algorithm [9] completed the mining in three passes of
database scanning. Nevertheless, additional computation
time is required to transform a database of horizontal layout
to vertical format, which also requires additional storage
space several times larger than that of the original sequence
database.
With rapid cost down and the evidence of the
increase in installed memory size, many small or medium
sized databases will fit into the main memory. For example,
a platform with 256MB memory may hold a database with
one million sequences of total size 189MB. Pattern mining
performed directly in memory now becomes possible.
However, current approaches discover the patterns either
through multiple scans of the database or by iterative
database projections, thereby requiring abundant disk
operations. The mining efficiency could be improved if the
excessive disk I/O is reduced by enhancing memory
utilization in the discovering process.
III. CONCLUSION
In this paper, we surveyed the list of existing sequential rule
mining techniques. We restricted ourselves to the classic
sequential rule mining problem. It is the generation of all
sequential rules that exists in market basket like data with
respect to minimal thresholds for support & confidence.
In a forthcoming paper, we pursue the development
of a novel algorithm that efficiently mines sequential
association rules from a market basket data set.
REFERENCES
[1] R. Agrawal and R. Srikant, “Mining Sequential
Patterns,” Proceedings of the 11th
International
Conference on Data Engineering, Taipei, Taiwan, pp. 3-
14, March 1995.
[2] F. Masseglia, F. Cathala, and P. Poncelet, “The PSP
Approach for Mining Sequential Patterns,” Proceedings
of 1998 2nd European Symposium on Principles of
Data Mining and Knowledge Discovery, Vol. 1510,
Nantes, France, pp. 176-184, Sep. 1998.
[3] R. Srikant and R. Agrawal, “Mining Sequential
Patterns: Generalizations and Performance
Improvements,” Proceedings of the 5th International
Conference on Extending Database Technology,
Avignon, France, pp. 3-17, 1996. (An extended version
is the IBM Research Report RJ 9994)
[4] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal and M.-C.
Hsu, “PrefixSpan: Mining Sequential Patterns
Efficiently by Prefix-projected Pattern Growth,”
Proceedings of 2001 International Conference on Data
Engineering, pp. 215-224, 2001.
[5] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and
M.-C. Hsu, “FreeSpan: Frequent Pattern-projected
Sequential Pattern Mining,” Proceedings of the 6th
ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 355-359, 2000.
[6] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U.
Dayal, “Multi-Dimensional Sequential Pattern Mining,”
Proceedings of the 10th International Conference on
Information and Knowledge Management, pp. 81-88,
2001.
[7] J. Ayres, J. E. Gehrke, T. Yiu, and J. Flannick,
“Sequential PAttern Mining Using Bitmaps,”
Proceedings of the Eighth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining.
Edmonton, Alberta, Canada, July 2002.
[8] S. Parthasarathy, M. J. Zaki, M. Ogihara, and S.
Dwarkadas, “Incremental and Interactive Sequence
Mining,” Proceedings of the 8th International
Conference on Information and Knowledge
Management, Kansas, Missouri, USA, pp. 251-258,
Nov. 1999.
[9] M. J. Zaki, “SPADE: An Efficient Algorithm for
Mining Frequent Sequences,” Machine Learning
Journal, Vol. 42, No. 1/2, pp. 31-60, 2001.
[10]quandary by Smith was well-received; nevertheless,
such a claim did not completelyfulfill this mission.
Without using the construction of Internet QoS, it is
hard to imagine that scatter/gather I/O and the
Ethernet can collaborate to fulfill this mission. Alitany
of related work supports our use ofcompact
modalities [20]. Obviously, despitesubstantial work in
this area, our method isevidently the algorithm of
choice among Cyberinformaticians.

More Related Content

What's hot (18)

PDF
Ijcatr04051004
Editor IJCATR
 
PDF
Fast Sequential Rule Mining
ijsrd.com
 
PDF
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
IRJET Journal
 
PDF
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MINING
csandit
 
PDF
K355662
IJERA Editor
 
PPT
Fp growth tree improve its efficiency and scalability
Dr.Manmohan Singh
 
PDF
20120140502006
IAEME Publication
 
PDF
Literature Survey of modern frequent item set mining methods
ijsrd.com
 
PDF
A comprehensive study of major techniques of multi level frequent pattern min...
eSAT Publishing House
 
PDF
A classification of methods for frequent pattern mining
IOSR Journals
 
PPTX
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
PDF
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
PDF
An incremental mining algorithm for maintaining sequential patterns using pre...
Editor IJMTER
 
PPT
Associations1
mancnilu
 
PDF
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
PDF
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
IJDKP
 
Ijcatr04051004
Editor IJCATR
 
Fast Sequential Rule Mining
ijsrd.com
 
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
IRJET Journal
 
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MINING
csandit
 
K355662
IJERA Editor
 
Fp growth tree improve its efficiency and scalability
Dr.Manmohan Singh
 
20120140502006
IAEME Publication
 
Literature Survey of modern frequent item set mining methods
ijsrd.com
 
A comprehensive study of major techniques of multi level frequent pattern min...
eSAT Publishing House
 
A classification of methods for frequent pattern mining
IOSR Journals
 
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
An incremental mining algorithm for maintaining sequential patterns using pre...
Editor IJMTER
 
Associations1
mancnilu
 
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
IJDKP
 

Similar to A Survey of Sequential Rule Mining Techniques (20)

PDF
Review Over Sequential Rule Mining
ijsrd.com
 
PDF
A Brief Overview On Frequent Pattern Mining Algorithms
Sara Alvarez
 
PDF
Ijsrdv1 i2039
ijsrd.com
 
PDF
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
BRNSSPublicationHubI
 
PDF
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
PDF
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
PDF
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
ijsrd.com
 
PDF
International Journal of Engineering Research and Development
IJERD Editor
 
PDF
A novel algorithm for mining closed sequential patterns
IJDKP
 
PDF
A Study of Various Projected Data Based Pattern Mining Algorithms
ijsrd.com
 
PDF
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
BRNSSPublicationHubI
 
PDF
Sequential Pattern Mining Methods: A Snap Shot
IOSR Journals
 
PDF
A Quantified Approach for large Dataset Compression in Association Mining
IOSR Journals
 
PDF
Ijariie1129
IJARIIE JOURNAL
 
PPT
20IT501_DWDM_PPT_Unit_V.ppt
PalaniKumarR2
 
PDF
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
IJDKP
 
PDF
A genetic algorithm coupled with tree-based pruning for mining closed associa...
IJECEIAES
 
PDF
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
IJMER
 
PDF
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
PDF
Bc26354358
IJERA Editor
 
Review Over Sequential Rule Mining
ijsrd.com
 
A Brief Overview On Frequent Pattern Mining Algorithms
Sara Alvarez
 
Ijsrdv1 i2039
ijsrd.com
 
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
BRNSSPublicationHubI
 
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
ijsrd.com
 
International Journal of Engineering Research and Development
IJERD Editor
 
A novel algorithm for mining closed sequential patterns
IJDKP
 
A Study of Various Projected Data Based Pattern Mining Algorithms
ijsrd.com
 
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
BRNSSPublicationHubI
 
Sequential Pattern Mining Methods: A Snap Shot
IOSR Journals
 
A Quantified Approach for large Dataset Compression in Association Mining
IOSR Journals
 
Ijariie1129
IJARIIE JOURNAL
 
20IT501_DWDM_PPT_Unit_V.ppt
PalaniKumarR2
 
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
IJDKP
 
A genetic algorithm coupled with tree-based pruning for mining closed associa...
IJECEIAES
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
IJMER
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
Bc26354358
IJERA Editor
 
Ad

More from ijsrd.com (20)

PDF
IoT Enabled Smart Grid
ijsrd.com
 
PDF
A Survey Report on : Security & Challenges in Internet of Things
ijsrd.com
 
PDF
IoT for Everyday Life
ijsrd.com
 
PDF
Study on Issues in Managing and Protecting Data of IOT
ijsrd.com
 
PDF
Interactive Technologies for Improving Quality of Education to Build Collabor...
ijsrd.com
 
PDF
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
ijsrd.com
 
PDF
A Study of the Adverse Effects of IoT on Student's Life
ijsrd.com
 
PDF
Pedagogy for Effective use of ICT in English Language Learning
ijsrd.com
 
PDF
Virtual Eye - Smart Traffic Navigation System
ijsrd.com
 
PDF
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
ijsrd.com
 
PDF
Understanding IoT Management for Smart Refrigerator
ijsrd.com
 
PDF
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
ijsrd.com
 
PDF
A Review: Microwave Energy for materials processing
ijsrd.com
 
PDF
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
PDF
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
ijsrd.com
 
PDF
Making model of dual axis solar tracking with Maximum Power Point Tracking
ijsrd.com
 
PDF
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
ijsrd.com
 
PDF
Study and Review on Various Current Comparators
ijsrd.com
 
PDF
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
ijsrd.com
 
PDF
Defending Reactive Jammers in WSN using a Trigger Identification Service.
ijsrd.com
 
IoT Enabled Smart Grid
ijsrd.com
 
A Survey Report on : Security & Challenges in Internet of Things
ijsrd.com
 
IoT for Everyday Life
ijsrd.com
 
Study on Issues in Managing and Protecting Data of IOT
ijsrd.com
 
Interactive Technologies for Improving Quality of Education to Build Collabor...
ijsrd.com
 
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
ijsrd.com
 
A Study of the Adverse Effects of IoT on Student's Life
ijsrd.com
 
Pedagogy for Effective use of ICT in English Language Learning
ijsrd.com
 
Virtual Eye - Smart Traffic Navigation System
ijsrd.com
 
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
ijsrd.com
 
Understanding IoT Management for Smart Refrigerator
ijsrd.com
 
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
ijsrd.com
 
A Review: Microwave Energy for materials processing
ijsrd.com
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
ijsrd.com
 
Making model of dual axis solar tracking with Maximum Power Point Tracking
ijsrd.com
 
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
ijsrd.com
 
Study and Review on Various Current Comparators
ijsrd.com
 
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
ijsrd.com
 
Defending Reactive Jammers in WSN using a Trigger Identification Service.
ijsrd.com
 
Ad

Recently uploaded (20)

PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PPTX
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
GitOps_Without_K8s_Training simple one without k8s
DanialHabibi2
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Thermal runway and thermal stability.pptx
godow93766
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Day2 B2 Best.pptx
helenjenefa1
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
Hashing Introduction , hash functions and techniques
sailajam21
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
GitOps_Without_K8s_Training simple one without k8s
DanialHabibi2
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 

A Survey of Sequential Rule Mining Techniques

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 2, 2013 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 211 A Survey of Sequential Rule Mining Techniques Shabana Anwar1 Prof. Abhishek Raghuvanshi2 Abstract— In this paper, we present an overview of existing sequential rule mining algorithms. All these algorithms are described more or less on their own. Sequential rule mining is a very popular and computationally expensive task. We also explain the fundamentals of sequential rule mining. We describe today’s approaches for sequential rule mining. From the broad variety of efficient algorithms that have been developed we will compare the most important ones. We will systematize the algorithms and analyze their performance based on both their run time performance and theoretical considerations. Their strengths and weaknesses are also investigated. It turns out that the behavior of the algorithms is much more similar as to be expected. I. INTRODUCTION Data mining is the process of extracting interesting (non- trivial, implicit, previously unknown and potentially useful) information or patterns from large information repositories such as: relational database, data warehouses, XML repository, etc. Also data mining is known as one of the core processes of Knowledge Discovery in Database (KDD). Of all the mining functions in the knowledge discovering process, frequent pattern mining is to find out the frequently occurred patterns. The measure of frequent patterns is a user-specified threshold that indicates the minimum occurring frequency of the pattern. We may categorize recent studies in frequent pattern mining into the discovery of association rules and the discovery of sequential patterns. Association discovery finds closely correlated sets so that the presence of some elements in a frequent set will imply the presence of the remaining elements (in the same set). Sequential pattern discovery finds temporal associations so that not only closely correlated sets but also their relationships in time are uncovered. In a Sequence Database, each sequence is an time-ordered list of itemsets. An itemset is an unordered set of items (symbols), considered to occur simultaneously. Sequential Pattern Mining is probably the most popular set of techniques for discovering temporal patterns in sequence databases. SPM finds subsequences that are common to more than minsup sequences. SPM is limited for making predictions. For example, consider the pattern {x},{y}. It is possible that y appears frequently after an x but that there are also many cases where x is not followed by y. For prediction, we need a measurement of the confidence that if x occurs, y will occur afterward A sequential rule typically has the form X->Y .A sequential rule X⇒Y has two properties:  Support: the number of sequences where X occurs before Y, divided by the number of sequences.  Confidence the number of sequences where X occurs before Y, divided by the number of sequences where X occurs. Sequential Rule Mining finds all valid rules, rules with a support and confidence not less than user-defined thresholds minSup and minConf For Example : An example of Sequential Rule Mining is as follows: Consider minSup= 0.5 and minConf= 0.5: Figure1: A sequence database Figure. 2: some rules found II. A SURVEY OF SRM METHODS In general, we may categorize the mining approaches into the generate-and-test framework and the pattern-growth one, for sequence databases of horizontal layout. Typifying the former approaches [1,2 , 3], the GSP (Generalized Sequential Pattern) algorithm [3] generates potential patterns (called candidates), scans each data sequence in the database to compute the frequencies of candidates (called supports), and then identifies candidates having enough supports as sequential patterns. The sequential patterns in current database pass become seeds for generating candidates in the next pass. This generate-and-test process is
  • 2. A Survey of Sequential Rule Mining Techniques (IJSRD/Vol. 1/Issue 2/2013/0038) All rights reserved by www.ijsrd.com 212 repeated until no more new candidates are generated. When candidates cannot fit in memory in a batch, GSP re-scans the database to test the remaining candidates that have not been loaded into memory. Consequently, GSP scans at least k times of the on-disk database if the maximum size of the discovered patterns is k, which incurs high cost of disk reading. Despite that GSP was good at candidate pruning, the number of candidates is still very huge that might impair the mining efficiency. The PrefixSpan (Prefix-projected Sequential pattern mining) algorithm [4], representing the pattern- growth methodology [5, 4, 6], finds the frequent items after scanning the sequence database once. The database is then projected, according to the frequent items, into several smaller databases. Finally, the complete set of sequential patterns is found by recursively growing subsequence fragments in each projected database. Two optimizations for minimizing disk projections were described in [4]. The bi- level projection technique, dealing with huge databases, scans each data sequence twice in the (projected) database so that fewer and smaller projected databases are generated. The pseudo-projection technique, avoiding physical projections, maintains the sequence-postfix of each data sequence in a projection by a pointer-offset pair. However, according to [4], maximum mining performance can be achieved only when the database size is reduced to the size accommodable by the main memory by employing pseudo- projection after using bi-level optimization. Although PrefixSpan successfully discovered patterns employing the divide-and-conquer strategy, the cost of disk I/O might be high due to the creation and processing of the projected sub- databases. Besides the horizontal layout, the sequence database can be transformed into a vertical format consisting of items’ id-lists [7, 8, 9]. The id-list of an item is a list of (sequence-id, timestamp) pairs indicating the occurring timestamps of the item in that sequence. Searching in the lattice formed by id-list intersections, the SPADE (Sequential PAttern Discovery using Equivalence classes) algorithm [9] completed the mining in three passes of database scanning. Nevertheless, additional computation time is required to transform a database of horizontal layout to vertical format, which also requires additional storage space several times larger than that of the original sequence database. With rapid cost down and the evidence of the increase in installed memory size, many small or medium sized databases will fit into the main memory. For example, a platform with 256MB memory may hold a database with one million sequences of total size 189MB. Pattern mining performed directly in memory now becomes possible. However, current approaches discover the patterns either through multiple scans of the database or by iterative database projections, thereby requiring abundant disk operations. The mining efficiency could be improved if the excessive disk I/O is reduced by enhancing memory utilization in the discovering process. III. CONCLUSION In this paper, we surveyed the list of existing sequential rule mining techniques. We restricted ourselves to the classic sequential rule mining problem. It is the generation of all sequential rules that exists in market basket like data with respect to minimal thresholds for support & confidence. In a forthcoming paper, we pursue the development of a novel algorithm that efficiently mines sequential association rules from a market basket data set. REFERENCES [1] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, pp. 3- 14, March 1995. [2] F. Masseglia, F. Cathala, and P. Poncelet, “The PSP Approach for Mining Sequential Patterns,” Proceedings of 1998 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, Vol. 1510, Nantes, France, pp. 176-184, Sep. 1998. [3] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proceedings of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 3-17, 1996. (An extended version is the IBM Research Report RJ 9994) [4] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal and M.-C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth,” Proceedings of 2001 International Conference on Data Engineering, pp. 215-224, 2001. [5] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and M.-C. Hsu, “FreeSpan: Frequent Pattern-projected Sequential Pattern Mining,” Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 355-359, 2000. [6] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal, “Multi-Dimensional Sequential Pattern Mining,” Proceedings of the 10th International Conference on Information and Knowledge Management, pp. 81-88, 2001. [7] J. Ayres, J. E. Gehrke, T. Yiu, and J. Flannick, “Sequential PAttern Mining Using Bitmaps,” Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002. [8] S. Parthasarathy, M. J. Zaki, M. Ogihara, and S. Dwarkadas, “Incremental and Interactive Sequence Mining,” Proceedings of the 8th International Conference on Information and Knowledge Management, Kansas, Missouri, USA, pp. 251-258, Nov. 1999. [9] M. J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning Journal, Vol. 42, No. 1/2, pp. 31-60, 2001. [10]quandary by Smith was well-received; nevertheless, such a claim did not completelyfulfill this mission. Without using the construction of Internet QoS, it is hard to imagine that scatter/gather I/O and the Ethernet can collaborate to fulfill this mission. Alitany of related work supports our use ofcompact modalities [20]. Obviously, despitesubstantial work in this area, our method isevidently the algorithm of choice among Cyberinformaticians.