SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1076
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC
in Frequent Sequences Dataset
Mr.Suraj Patil1, Prof: Parth Sagar2
1P. G Student, Department of Computer Engineering, RMD Sinhgad School of Engineering, Pune, India
2Assistant Professor, Department of Computer Engineering, RMD Sinhgad School of Engineering, Pune, India
-----------------------------------------------------------------------------***----------------------------------------------------------------------------
Abstract- Discovery of Frequent sequence mining is an
essential data mining taskwithbroadapplications.Theoutput
of the algorithm is used in many other areas like market
basket analysis, chemistry and bioinformatics. The frequent
sequence mining is computationally high expensive. Further
sequential patterns mining in a single dimension mining and
multidimensional sequential patterns can give us more
constructive and useful patterns. Due to the huge enhance in
data volume and also fairly large search space, efficient
solutions for finding patterns in multidimensional sequence
data are currently very important. For this reason, developing
a frequent sequence mining algorithm is necessary. Parallel
algorithm follows the step by step approach and all
participating processors or workers generate candidate
sequence and count their estimate time and supports
independently.
Key Words: Data mining, frequent sequence mining,
parallel algorithms, static load balancing, probabilistic
algorithms.
1. INTRODUCTION
Repeated pattern removal is an important data mining
technique with a wide variety of mined patterns. The mined
frequent patterns can be sets of items(itemsets),sequences,
graphs, trees, etc. Frequent sequence mining was first
described in. The GSP algorithm presented in is the first to
solve the problem of frequent sequence mining. As the
repeated series removal is an extension of item set mining,
the GSP algorithm is an extension of the Apriori algorithm.
As a consequence of the slowness and memoryconsumption
of algorithms described in other algorithms were proposed.
These two algorithms use the so-called prefix-based
equivalence classes (PBECs in short), i.e., represent the
pattern as a string and partition the set of all patterns into
disjoint sets using prefixes. There are two kinds of parallel
computers: shared memory technology and distributed
memory technology. Parallelizing on the shared memory
technology is easier than parallelizing on distributed
memory technology. Samplingtechniquethatstaticallyload-
balance the computation of parallel frequentitemsetmining
process, are proposed in these three papers, the so-called
double sampling process and its three variants were
proposed.
There are other problemswithstaticload-balancing
the estimation of the running time in a PBEC is a non trivial
task. The intuition behind this is that exact computation of
number of frequent sequences in a PBEC is at least #P
complete task. The hardness also comes from the fact that
the amount of work necessary to process one sequence vary
among sequences. The last problem, according to our
experiments, is that each processor gets almost the whole
database. A method of static load-balancing, called selective
sampling is presented in parallel mining. The selective
sampling process estimates the running time in each PBEC
by removing some items from the database.
In this paper we have Static Load Balancing of
Parallel Mining Efficient Algorithm methods. Section 1 of
Introduction Section 2 this paper deals with related work
and Section 3 Proposed System 4 Algorithms 5 Results and
Discussion Section 6 Conclusion and future work of the
paper.
2. RELATED WORK
In the Static load balancing important thingsareview
of load, assessment of load, constancy of different system,
performance of system, interaction between the records ,
natural world of work to be transferred, selecting of data
sets and many other ones to consider while developing such
algorithm
In this paper,[1] Sequential Pattern Mining from
Multidimensional Sequence Data in Parallel findingpatterns
in multidimensional sequence data are nowadays very
important present a multidimensional sequencemodel anda
parallel algorithm follows the level-wise approach and all
participating processors or workers generate candidate
sequence and count their supports independently.
In this paper, [2]Prefix Span Mining Sequential
Patterns by Prefix Projected Pattern which discovers
frequent sub sequences as patterns in a sequence database,
is an important data mining problem with broad
applications, including the analysis of customer purchase
patterns or Web access patterns, the analysis of sequencing
or time related processes such as scientific experiments,
natural disasters, and disease treatments, the analysis of
DNA sequences etc.
In the paper,[3] Parallel Sequence Mining on Shared-
Memory Machines a parallel al-gorithm for fast discovery of
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1077
frequent sequences in large databases.pSPADEdecomposes
the original search space into smaller sufix based classes.
Each class can be solved in main memory using efficient
search techniques, and simple join operations. Further each
class can be solved in-dependently on each processor
requiring no synchronization
In this paper, [4] Sequential mining patterns and
algorithms analysis Sequential pattern is a set of item sets
structured in sequence database which occurs sequentially
with a specific order. A sequence database is a setofordered
elements or events, stored with or withouta concrete notion
of time The most used measures used to evaluate sequential
patterns are the support and confidence.
In this paper,[5] Model for Load Balancing on
Processors in Parallel Mining of Frequent Item sets. This
market basket data consists of transactions made by each
customer. Each transaction contains items bought by the
customer. The goal is to see if the occurrenceofcertainitems
in a transaction can be used to deduce occurrence of other
items or in other words, to find associative relationships
between items
In this paper,[6] Probabilistic static load-balancingof
parallel Mining of repeated series in this project we present
a novel parallel algorithm for removal of repeated series
based on a static load-balancing. The static load balancing is
done by measuring the computational time they are slow
and needs muchmorememory,comparedtoDFSalgorithms.
In this paper,[7] Parallel Mining of Closed Sequential
Patterns to make sequential pat- tern mining practical for
large data sets, the miningprocessmustbe efficient,scalable,
and have a short response time. Moreover, since sequential
pattern mining requires iterative scans of the sequence
dataset with various data relationship and analysis
operations, it is computationally intensive.
3. PROPOSED SYSTEM
Proposed method is a novel parallel method that
statically load balance the computation. The set of all
frequent sequences is first split into PBECs, the relative
execution time of each PBEC is estimated and finally the
PBECs algorithm is assigned to processors. The method
estimates the processing time of one PBEC by the sequential
Prefix span algorithm using sampling data sets. It is
important to be aware that the running time of the parallel
sequential algorithm scales with points
1) the database size 2) the number of frequent
sequences 3) the number of embeddings of a frequent
sequence in database transactions.
Load Data base
Apllication
Data Set Splits
and Get to
Frequent item set
Get the Pattern
and sequences
count using prefix
span Algorithms
Maximum Memory get for
Data set
Calaluate
Execution
Time
Final Output with pattern
and minimum Support
Frequents Sequences
translation
Exection TIme
Display
Result
user
user
Fig 1. Proposed System Architecture
Static load balancing of the computation begins with
partitioning the set of all frequent sequences into disjoint
tasks. Because the PBECs are disjoint, they perfectly fit the
needs of the algorithm. the total processing time of the
sequential Prefix span algorithm. The processing time of
each PBEC should be evaluated. The algorithm begins
splitting the set of all frequent sequences into smallerpieces
recursively using PBECs. The relative size of a PBEC is the
estimate of the fraction of the total processing time of a
PBEC.
3.1 Module Description
1. Estimation of Support Module
In this module, we can estimate whether a support
of sequence is a subsequence of a transaction in a database
or not.
2. Estimation of Relative Size of PBEC Module
The relative size of a PBEC can be usedastheestimateof
the relative processing time of the PBEC by a sequential
algorithm. This estimate ignores some details of the
sequential algorithm. The relative size of PBEC might be
controllable size and smaller data set.
3.2 Prefix span Algorithm: Step wise considerations
1) Create initial collection of frequent extensions: The
algorithm starts adding items into a set of extensions S.
Adding items into S is not straightforward because the
collection of items requires
2) Construction of the initialpseudoprojecteddatabase:
Create the initial pseudo projected transaction for e for each
transaction. The projectionaddsneweventcontainingsingle
item e into S.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1078
3) Projection using a sequence extension:
Store the positions of data set into a new pseudo
projected transaction that is stored in new pseudodatabase.
4) The projection using an event extension:
Search for the item e until the end of the event is found.
4 Algorithms:
Algorithm 1. Prefix span Sequential
• PREFIXSPAN SEQUENTIAL (DatabaseD,Integermin
supp)
1. Σ ← all frequent extensions from D
2. for for all e Є Σ do
3. Q ← <(e)>
4. create projection D|q
5. PREFIXSPAN-MAINLOOP(D,Q,D|Q,min supp)
6. end for
5 RESULTS AND DISCUSSION
For the proposed system performance evaluation, we
calculate frequency. We implement the scheme on Java
framework with Intel Core i3 processor and 2 GB RAM. Here
the graph in Fig-2 demonstrates the system performance.
Fig.2 No of item sets
The PBECs are created, scheduled, and executed on the
processors. Because the PBECs are scheduled once, we talk
about static load balance of the computation. The sequential
algorithm runs for too long there is a need for parallel
algorithms. There is a very natural opportunityto parallelize
an arbitrary frequent sequence mining algorithm partition
these to all frequent sequences using the PBECs.
6 CONCLUSION AND FUTURE SCOPE
The frequent sequences and use this sample for estimating
the relative processing time of the algorithm in the PBECs
with evolutionary optimization technique The estimate of
the relative processing time is in fact performed by
estimating the computational complexity of processing
various PBECs. The relative processing time is then used for
partitioning and scheduling of the PBECs an algorithm for
mining of frequent sequences using static load-balancing.
The method creates a sample of frequent sequences the
relative processing time is then used for partitioning and
scheduling of the PBECs. The problem is that the estimated
size of a PBEC is dependent on the construction of the PBEC
(which should not happen). This dependency could be
probably removed by using, for example, the bootstrap
method.
In future we have to implement the parallel
algorithm to reduce the complexity of computational time
and implement the result of frequent sequence miningusing
static load balancing. Additionally, we have to reduce the
slowness and memory consumption of a process.
ACKNOWLEDGEMENT
It is my privilege to acknowledge with deep sense of
gratitude to my guide Prof. Parth Sagar for her kind
cooperation, valuable suggestions and capableguidanceand
timely help given to me in completion of my paper. I express
my gratitude to Prof. Vina M. Lomte, Head of Department,
RMDSSOE (Computer Dept.) for her constant
encouragement, suggestions, help and cooperation.
REFERENCES
[1] J. Ren, Y. Dong, and H. He, A parallel algorithm based on
prefix tree for sequence pattern mining, in Proc.1stACISInt.
Symp. Cryptography Netw. Security, Data Mining Knowl.
Discovery, E-Commerce Appl.EmbeddedSyst.,2010,pp.611
[2] R. Kessl and P. Tvrd, Toward more parallel frequent
itemset mining algorithms, in Proc. 19th IASTED Int. Conf.
Parallel Distrib. Comput. Syst., 2007, pp. 97103.
[3] T. Shintani and M. Kitsuregawa, Mining algorithms for
sequential patterns inparallel:Hashbasedapproach,inProc.
2nd Pacific-Asia Conf., Res. Develop. Knowl. Discovery Data
Mining, 1998, pp. 283294.
[4] R. Srikant and R. Agrawal, Mining sequential patterns
Generaliza- tions and performance improvements,in Proc.
5th Int. Conf. Extending Database Technol.: Adv. Database
Technol., 1996, pp. 117.
[5] R. Kessl and P. Tvrd, Probabilistic load balancing method
for parallel mining of all frequent itemsets, in Proc. 18th
IASTED Int. Conf. Parallel Distrib. Comput. Syst., 2006, pp.
578586.
[6] V. Guralnik, N. Garg, and G. Karypis, Parallel tree
projection algorithm for sequence mining,in Proc. 7th Int.
Euro-Par Conf. Euro-Par Parallel Process., 2001,pp.310320.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1079
[7] S. Cong, J. Han, J. Hoeflinger, and D. Padua, A sampling-
based framework for parallel data mining, in Proc.10thACM
SIGPLAN Symp. Principles Practice Parallel Program., 2005,
pp. 255265
[8] V. Guralnik and G. Karypis, Dynamic load balancing
algorithms for sequence mining, Univ. Minnesota,
Minneapolis, MN, US, Tech. Rep. TR 01-020, 2001.
[9] S. Cong, J. Han, J. Hoeflinger, and D. Padua, A sampling-
based framework for parallel data mining, in Proc.10thACM
SIGPLAN Symp. Principles Practice Parallel Program., 2005,
pp. 255265.
[10] M. J. Zaki, Parallel sequence mining on shared-memory
machines, J. Parallel Distrib. Comput., vol. 61, no. 3, pp.
401426, 2001.
[11] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New
algorithms for fast discovery ofassociationrules,inProc. 3rd
Int. Conf. Knowl. Discovery Data Mining, 1997, pp. 283286.
[12] K.-M. Yu and J. Zhou, Parallel TID-based frequent
pattern mining algorithm on a PCclusterandgridcomputing
system, Expert Syst. Appl., vol. 37, no.3,pp. 24862494,2010.
[13] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New
algorithms for fast discovery ofassociationrules,inProc. 3rd
Int. Conf. Knowl. Discovery Data Mining, 1997, pp. 283286.

More Related Content

What's hot (20)

PDF
Improved Max-Min Scheduling Algorithm
iosrjce
 
PPT
Chap1 slides
BaliThorat1
 
PDF
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
cscpconf
 
DOCX
Job shop scheduling problem using genetic algorithm
Aerial Telecom Solutions (ATS) Pvt. Ltd.
 
PDF
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Editor IJCATR
 
PDF
Using the black-box approach with machine learning methods in ...
butest
 
PPT
Chap2 slides
BaliThorat1
 
PDF
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
IJECEIAES
 
PDF
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...
ijdpsjournal
 
PDF
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
CSCJournals
 
PDF
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
ijsrd.com
 
PDF
Recursive
Alexander Cave
 
PDF
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
IJET - International Journal of Engineering and Techniques
 
PDF
SearsonGP_PLS_PhD_thesis
Dominic Searson
 
PDF
K017446974
IOSR Journals
 
PDF
D0931621
IOSR Journals
 
PDF
Applying Neural Networks and Analogous Estimating to Determine the Project Bu...
Ricardo Viana Vargas
 
PDF
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
IDES Editor
 
PDF
Pretzel: optimized Machine Learning framework for low-latency and high throu...
NECST Lab @ Politecnico di Milano
 
PDF
Scalable scheduling of updates in streaming data warehouses
IRJET Journal
 
Improved Max-Min Scheduling Algorithm
iosrjce
 
Chap1 slides
BaliThorat1
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
cscpconf
 
Job shop scheduling problem using genetic algorithm
Aerial Telecom Solutions (ATS) Pvt. Ltd.
 
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Editor IJCATR
 
Using the black-box approach with machine learning methods in ...
butest
 
Chap2 slides
BaliThorat1
 
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
IJECEIAES
 
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...
ijdpsjournal
 
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
CSCJournals
 
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
ijsrd.com
 
Recursive
Alexander Cave
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
IJET - International Journal of Engineering and Techniques
 
SearsonGP_PLS_PhD_thesis
Dominic Searson
 
K017446974
IOSR Journals
 
D0931621
IOSR Journals
 
Applying Neural Networks and Analogous Estimating to Determine the Project Bu...
Ricardo Viana Vargas
 
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
IDES Editor
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
NECST Lab @ Politecnico di Milano
 
Scalable scheduling of updates in streaming data warehouses
IRJET Journal
 

Similar to Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Frequent Sequences Dataset (20)

PDF
Probabilistic static load balancing of parallel mining of frequent sequences
Shakas Technologies
 
PDF
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
PDF
A Survey of Sequential Rule Mining Techniques
ijsrd.com
 
PDF
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
PDF
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
PPTX
FPPM algorithm
Ashis Kumar Chanda
 
PPTX
An efficient approach to mine flexible periodic patterns in time series datab...
Ashis Chanda
 
PPT
The study on mining temporal patterns and related applications in dynamic soc...
Thanh Hieu
 
PDF
Mining closed sequential patterns in large sequence databases
IJDMS
 
PDF
A novel algorithm for mining closed sequential patterns
IJDKP
 
PPT
5.3 mining sequential patterns
Krish_ver2
 
PDF
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
jasminealisha635
 
PDF
lecture13.pdfhejejejejekkeejejejejejejejej
jasminealisha635
 
PPTX
Wireless sensor network Apriori an N-RMP
Amrit Khandelwal
 
PPTX
Temporal Pattern Mining
Prakhar Dhama
 
PDF
Scalable frequent itemset mining using heterogeneous computing par apriori a...
ijdpsjournal
 
PDF
Parallel Key Value Pattern Matching Model
ijsrd.com
 
PDF
Mining sequential patterns for interval based
ijcsa
 
PDF
Sequential Pattern Tree Mining
IOSR Journals
 
PDF
Ijetcas14 316
Iasir Journals
 
Probabilistic static load balancing of parallel mining of frequent sequences
Shakas Technologies
 
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
A Survey of Sequential Rule Mining Techniques
ijsrd.com
 
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
FPPM algorithm
Ashis Kumar Chanda
 
An efficient approach to mine flexible periodic patterns in time series datab...
Ashis Chanda
 
The study on mining temporal patterns and related applications in dynamic soc...
Thanh Hieu
 
Mining closed sequential patterns in large sequence databases
IJDMS
 
A novel algorithm for mining closed sequential patterns
IJDKP
 
5.3 mining sequential patterns
Krish_ver2
 
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
jasminealisha635
 
lecture13.pdfhejejejejekkeejejejejejejejej
jasminealisha635
 
Wireless sensor network Apriori an N-RMP
Amrit Khandelwal
 
Temporal Pattern Mining
Prakhar Dhama
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
ijdpsjournal
 
Parallel Key Value Pattern Matching Model
ijsrd.com
 
Mining sequential patterns for interval based
ijcsa
 
Sequential Pattern Tree Mining
IOSR Journals
 
Ijetcas14 316
Iasir Journals
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PPTX
MATLAB : Introduction , Features , Display Windows, Syntax, Operators, Graph...
Amity University, Patna
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PPTX
Knowledge Representation : Semantic Networks
Amity University, Patna
 
PDF
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PDF
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...
SoundaryaBC2
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
MATLAB : Introduction , Features , Display Windows, Syntax, Operators, Graph...
Amity University, Patna
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
Knowledge Representation : Semantic Networks
Amity University, Patna
 
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 
Design Thinking basics for Engineers.pdf
CMR University
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...
SoundaryaBC2
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 

Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Frequent Sequences Dataset

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1076 Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Frequent Sequences Dataset Mr.Suraj Patil1, Prof: Parth Sagar2 1P. G Student, Department of Computer Engineering, RMD Sinhgad School of Engineering, Pune, India 2Assistant Professor, Department of Computer Engineering, RMD Sinhgad School of Engineering, Pune, India -----------------------------------------------------------------------------***---------------------------------------------------------------------------- Abstract- Discovery of Frequent sequence mining is an essential data mining taskwithbroadapplications.Theoutput of the algorithm is used in many other areas like market basket analysis, chemistry and bioinformatics. The frequent sequence mining is computationally high expensive. Further sequential patterns mining in a single dimension mining and multidimensional sequential patterns can give us more constructive and useful patterns. Due to the huge enhance in data volume and also fairly large search space, efficient solutions for finding patterns in multidimensional sequence data are currently very important. For this reason, developing a frequent sequence mining algorithm is necessary. Parallel algorithm follows the step by step approach and all participating processors or workers generate candidate sequence and count their estimate time and supports independently. Key Words: Data mining, frequent sequence mining, parallel algorithms, static load balancing, probabilistic algorithms. 1. INTRODUCTION Repeated pattern removal is an important data mining technique with a wide variety of mined patterns. The mined frequent patterns can be sets of items(itemsets),sequences, graphs, trees, etc. Frequent sequence mining was first described in. The GSP algorithm presented in is the first to solve the problem of frequent sequence mining. As the repeated series removal is an extension of item set mining, the GSP algorithm is an extension of the Apriori algorithm. As a consequence of the slowness and memoryconsumption of algorithms described in other algorithms were proposed. These two algorithms use the so-called prefix-based equivalence classes (PBECs in short), i.e., represent the pattern as a string and partition the set of all patterns into disjoint sets using prefixes. There are two kinds of parallel computers: shared memory technology and distributed memory technology. Parallelizing on the shared memory technology is easier than parallelizing on distributed memory technology. Samplingtechniquethatstaticallyload- balance the computation of parallel frequentitemsetmining process, are proposed in these three papers, the so-called double sampling process and its three variants were proposed. There are other problemswithstaticload-balancing the estimation of the running time in a PBEC is a non trivial task. The intuition behind this is that exact computation of number of frequent sequences in a PBEC is at least #P complete task. The hardness also comes from the fact that the amount of work necessary to process one sequence vary among sequences. The last problem, according to our experiments, is that each processor gets almost the whole database. A method of static load-balancing, called selective sampling is presented in parallel mining. The selective sampling process estimates the running time in each PBEC by removing some items from the database. In this paper we have Static Load Balancing of Parallel Mining Efficient Algorithm methods. Section 1 of Introduction Section 2 this paper deals with related work and Section 3 Proposed System 4 Algorithms 5 Results and Discussion Section 6 Conclusion and future work of the paper. 2. RELATED WORK In the Static load balancing important thingsareview of load, assessment of load, constancy of different system, performance of system, interaction between the records , natural world of work to be transferred, selecting of data sets and many other ones to consider while developing such algorithm In this paper,[1] Sequential Pattern Mining from Multidimensional Sequence Data in Parallel findingpatterns in multidimensional sequence data are nowadays very important present a multidimensional sequencemodel anda parallel algorithm follows the level-wise approach and all participating processors or workers generate candidate sequence and count their supports independently. In this paper, [2]Prefix Span Mining Sequential Patterns by Prefix Projected Pattern which discovers frequent sub sequences as patterns in a sequence database, is an important data mining problem with broad applications, including the analysis of customer purchase patterns or Web access patterns, the analysis of sequencing or time related processes such as scientific experiments, natural disasters, and disease treatments, the analysis of DNA sequences etc. In the paper,[3] Parallel Sequence Mining on Shared- Memory Machines a parallel al-gorithm for fast discovery of
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1077 frequent sequences in large databases.pSPADEdecomposes the original search space into smaller sufix based classes. Each class can be solved in main memory using efficient search techniques, and simple join operations. Further each class can be solved in-dependently on each processor requiring no synchronization In this paper, [4] Sequential mining patterns and algorithms analysis Sequential pattern is a set of item sets structured in sequence database which occurs sequentially with a specific order. A sequence database is a setofordered elements or events, stored with or withouta concrete notion of time The most used measures used to evaluate sequential patterns are the support and confidence. In this paper,[5] Model for Load Balancing on Processors in Parallel Mining of Frequent Item sets. This market basket data consists of transactions made by each customer. Each transaction contains items bought by the customer. The goal is to see if the occurrenceofcertainitems in a transaction can be used to deduce occurrence of other items or in other words, to find associative relationships between items In this paper,[6] Probabilistic static load-balancingof parallel Mining of repeated series in this project we present a novel parallel algorithm for removal of repeated series based on a static load-balancing. The static load balancing is done by measuring the computational time they are slow and needs muchmorememory,comparedtoDFSalgorithms. In this paper,[7] Parallel Mining of Closed Sequential Patterns to make sequential pat- tern mining practical for large data sets, the miningprocessmustbe efficient,scalable, and have a short response time. Moreover, since sequential pattern mining requires iterative scans of the sequence dataset with various data relationship and analysis operations, it is computationally intensive. 3. PROPOSED SYSTEM Proposed method is a novel parallel method that statically load balance the computation. The set of all frequent sequences is first split into PBECs, the relative execution time of each PBEC is estimated and finally the PBECs algorithm is assigned to processors. The method estimates the processing time of one PBEC by the sequential Prefix span algorithm using sampling data sets. It is important to be aware that the running time of the parallel sequential algorithm scales with points 1) the database size 2) the number of frequent sequences 3) the number of embeddings of a frequent sequence in database transactions. Load Data base Apllication Data Set Splits and Get to Frequent item set Get the Pattern and sequences count using prefix span Algorithms Maximum Memory get for Data set Calaluate Execution Time Final Output with pattern and minimum Support Frequents Sequences translation Exection TIme Display Result user user Fig 1. Proposed System Architecture Static load balancing of the computation begins with partitioning the set of all frequent sequences into disjoint tasks. Because the PBECs are disjoint, they perfectly fit the needs of the algorithm. the total processing time of the sequential Prefix span algorithm. The processing time of each PBEC should be evaluated. The algorithm begins splitting the set of all frequent sequences into smallerpieces recursively using PBECs. The relative size of a PBEC is the estimate of the fraction of the total processing time of a PBEC. 3.1 Module Description 1. Estimation of Support Module In this module, we can estimate whether a support of sequence is a subsequence of a transaction in a database or not. 2. Estimation of Relative Size of PBEC Module The relative size of a PBEC can be usedastheestimateof the relative processing time of the PBEC by a sequential algorithm. This estimate ignores some details of the sequential algorithm. The relative size of PBEC might be controllable size and smaller data set. 3.2 Prefix span Algorithm: Step wise considerations 1) Create initial collection of frequent extensions: The algorithm starts adding items into a set of extensions S. Adding items into S is not straightforward because the collection of items requires 2) Construction of the initialpseudoprojecteddatabase: Create the initial pseudo projected transaction for e for each transaction. The projectionaddsneweventcontainingsingle item e into S.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1078 3) Projection using a sequence extension: Store the positions of data set into a new pseudo projected transaction that is stored in new pseudodatabase. 4) The projection using an event extension: Search for the item e until the end of the event is found. 4 Algorithms: Algorithm 1. Prefix span Sequential • PREFIXSPAN SEQUENTIAL (DatabaseD,Integermin supp) 1. Σ ← all frequent extensions from D 2. for for all e Є Σ do 3. Q ← <(e)> 4. create projection D|q 5. PREFIXSPAN-MAINLOOP(D,Q,D|Q,min supp) 6. end for 5 RESULTS AND DISCUSSION For the proposed system performance evaluation, we calculate frequency. We implement the scheme on Java framework with Intel Core i3 processor and 2 GB RAM. Here the graph in Fig-2 demonstrates the system performance. Fig.2 No of item sets The PBECs are created, scheduled, and executed on the processors. Because the PBECs are scheduled once, we talk about static load balance of the computation. The sequential algorithm runs for too long there is a need for parallel algorithms. There is a very natural opportunityto parallelize an arbitrary frequent sequence mining algorithm partition these to all frequent sequences using the PBECs. 6 CONCLUSION AND FUTURE SCOPE The frequent sequences and use this sample for estimating the relative processing time of the algorithm in the PBECs with evolutionary optimization technique The estimate of the relative processing time is in fact performed by estimating the computational complexity of processing various PBECs. The relative processing time is then used for partitioning and scheduling of the PBECs an algorithm for mining of frequent sequences using static load-balancing. The method creates a sample of frequent sequences the relative processing time is then used for partitioning and scheduling of the PBECs. The problem is that the estimated size of a PBEC is dependent on the construction of the PBEC (which should not happen). This dependency could be probably removed by using, for example, the bootstrap method. In future we have to implement the parallel algorithm to reduce the complexity of computational time and implement the result of frequent sequence miningusing static load balancing. Additionally, we have to reduce the slowness and memory consumption of a process. ACKNOWLEDGEMENT It is my privilege to acknowledge with deep sense of gratitude to my guide Prof. Parth Sagar for her kind cooperation, valuable suggestions and capableguidanceand timely help given to me in completion of my paper. I express my gratitude to Prof. Vina M. Lomte, Head of Department, RMDSSOE (Computer Dept.) for her constant encouragement, suggestions, help and cooperation. REFERENCES [1] J. Ren, Y. Dong, and H. He, A parallel algorithm based on prefix tree for sequence pattern mining, in Proc.1stACISInt. Symp. Cryptography Netw. Security, Data Mining Knowl. Discovery, E-Commerce Appl.EmbeddedSyst.,2010,pp.611 [2] R. Kessl and P. Tvrd, Toward more parallel frequent itemset mining algorithms, in Proc. 19th IASTED Int. Conf. Parallel Distrib. Comput. Syst., 2007, pp. 97103. [3] T. Shintani and M. Kitsuregawa, Mining algorithms for sequential patterns inparallel:Hashbasedapproach,inProc. 2nd Pacific-Asia Conf., Res. Develop. Knowl. Discovery Data Mining, 1998, pp. 283294. [4] R. Srikant and R. Agrawal, Mining sequential patterns Generaliza- tions and performance improvements,in Proc. 5th Int. Conf. Extending Database Technol.: Adv. Database Technol., 1996, pp. 117. [5] R. Kessl and P. Tvrd, Probabilistic load balancing method for parallel mining of all frequent itemsets, in Proc. 18th IASTED Int. Conf. Parallel Distrib. Comput. Syst., 2006, pp. 578586. [6] V. Guralnik, N. Garg, and G. Karypis, Parallel tree projection algorithm for sequence mining,in Proc. 7th Int. Euro-Par Conf. Euro-Par Parallel Process., 2001,pp.310320.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1079 [7] S. Cong, J. Han, J. Hoeflinger, and D. Padua, A sampling- based framework for parallel data mining, in Proc.10thACM SIGPLAN Symp. Principles Practice Parallel Program., 2005, pp. 255265 [8] V. Guralnik and G. Karypis, Dynamic load balancing algorithms for sequence mining, Univ. Minnesota, Minneapolis, MN, US, Tech. Rep. TR 01-020, 2001. [9] S. Cong, J. Han, J. Hoeflinger, and D. Padua, A sampling- based framework for parallel data mining, in Proc.10thACM SIGPLAN Symp. Principles Practice Parallel Program., 2005, pp. 255265. [10] M. J. Zaki, Parallel sequence mining on shared-memory machines, J. Parallel Distrib. Comput., vol. 61, no. 3, pp. 401426, 2001. [11] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New algorithms for fast discovery ofassociationrules,inProc. 3rd Int. Conf. Knowl. Discovery Data Mining, 1997, pp. 283286. [12] K.-M. Yu and J. Zhou, Parallel TID-based frequent pattern mining algorithm on a PCclusterandgridcomputing system, Expert Syst. Appl., vol. 37, no.3,pp. 24862494,2010. [13] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New algorithms for fast discovery ofassociationrules,inProc. 3rd Int. Conf. Knowl. Discovery Data Mining, 1997, pp. 283286.