SlideShare a Scribd company logo
International Conference on Computer Applications 6
Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques
for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print.
International Conference on Computer Applications 2016 [ICCA 2016]
ISBN 978-81-929866-5-4 VOL 05
Website icca.co.in eMail icca@asdf.res.in
Received 14 – March– 2016 Accepted 02 - April – 2016
Article ID ICCA002 eAID ICCA.2016.002
Study on Positive and Negative Rule Based Mining
Techniques for E-Commerce Applications
Kavita Yadav1
, Pravin G Kulurkar2
1,2
Vidharbha Institute of Technology, Nagpur
Abstract- In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting
knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and
many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The
classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would
be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple
techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm,
MOPNAR Algorithm and the Top K Rules.
Keywords: Data Mining, CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
1. INTRODUCTION
In today’s world human beings are using multiple applications to ease their work. Every day a lot of data is generated in every field.
The data can be in the form of Documents, graphical representation like picture or video and there can be multiple records also. Since
there are multiple types of data there can be multiple types of format proper action should be taken for their better utilization of the
available data. Since when the user wants to use the data the data can be retrieved in the proper format and information.
The technique to retrieve knowledge from the data is termed as data mining or knowledge hub or simple Knowledge Discovery
process (KDD).The important reason that attracted a great deal of attention in information technology the discovery of useful
information from large collections of data industry towards field of “Data mining” is due to the perception of “we are data rich but
information poor”. This perception is there because we have a very huge amount of data but we cannot convert it to useful information
for decision making in different fields. To produce knowledge we require a lot of data and which could be in all possible formats like
audio video images documents and much more. In data mining to get the full advantage not only the retrieval but also the tool for
extraction of the essence of information stored, summarization of data and discovery of patterns in the data too is required for the
knowledge extraction.
Since there is no lack of supply of data we are having a lot of data in different formats therefore it is important to develop a system
which can convert this data into knowledge to help in decision making processes. The data mining tools can help in predicting behavior
and future trends which can help organizations to make future knowledge-driven decisions. The data mining tools provides various
features like automated, prospective analyses can help a better decision making scenario. In this paper we would be studying various
data mining techniques and will review which technique can be used in the hybrid of the data mining technique.
This paper is prepared exclusively for International Conference on Computer Applications 2016 [ICCA 2016] which is published by ASDF International,
Registered in London, United Kingdom under the directions of the Editor-in-Chief Dr Gunasekaran Gunasamy and Editors Dr. Daniel James, Dr. Kokula Krishna
Hari Kunasekaran and Dr. Saikishore Elangovan. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first
page. Copyrights for third-party components of this work must be honoured. For all other uses, contact the owner/author(s). Copyright Holder can be reached at
copy@asdf.international for distribution.
2016 © Reserved by Association of Scientists, Developers and Faculties [www.ASDF.international]
International Conference on Computer Applications 7
Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques
for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print.
2. Charm Algorithm
CHARM is an efficient data mining technique which is used for enumerating the set of all frequent closed data item-sets. There are
multiple innovative ideas implemented in the development of charm. This technique simultaneously explores both the item-set space
and transaction space over item set-tides tree that is the search space of the database.
Figure 1: Charm algorithm
This technique uses a highly efficient hybrid search method that skips many levels of the tree to quickly identify the frequent closed set
instead of having multiple possible subset to analysis. Fast Hash-based approach is used to eliminate item set during the execution.
Charm also able to utilize a novel vertical data representation called diffset for fast frequency computations. Diffsets also keep track for
differences in the tids of a candidate pattern from its prefix pattern. Since diffset reduce the size of the memory required to store
intermediate result, therefore the entire working in some memory even for huge database.
3. CM-Spam Algorithm
In data mining getting useful patterns is a challenging task. In sequential database many techniques have been proposed for getting the
patterns. A subsequence is called sequential pattern or frequent sequence if it frequently appears in a sequence database and its
frequency is no less than a user-specified minimum support threshold minsup. This sequential pattern is very important in datamining
as it helps in analysis of multiple applications like web medical data, program executions, click-streams, e-learning data and biological
data. There are several efficient algorithm present for getting patterns amongst them the most efficient is the CM-SPAM Algorithm.
Figure: CM-spam algorithm
International Conference on Computer Applications 8
Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques
for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print.
The figure above shows the CM-SPAM Algorithm here first the trained dataset with known input and output is given as the input to
the system. The data is divided equally if it is necessary as it gives the sequential classification from the known input and sequential
classification form the known output. The combination of both the above dataset it provides us the classification sequential
dataset.Then CM SPAM algorithm is applied to get the sequential construction, parameter learning and optimized dataset from the
sequential dataset. All these combinations of parameters give us the Classification model of Sequential CM SPAM which is then used on
testing the dataset for the output.
4. Apriori Algorithm
The apriori algorithm is one of the influential algorithm for mining for Boolean association rules. The key concept of the algorithm lies
within Frequent Itemsets they are those sets of item which has minimum support and are denoted by Lifor ithItemset in the data.
Secondly we have Apriori Property it is nothing butany subset of frequent itemset must be frequent and lastly we have join Operation
it is required to find Lk which is a set of candidate k-itemsets is generated by joining Lk-1with itself. Here we will first find the
frequent items the sets of items that have minimum support. Here a subset of a frequent itemset must also be a frequent itemset else
the algorithm will not work that is if {t1 t2} is a frequent itemset both {t1} and {t2} should be a frequent itemset else the technique
will fail to give a proper output. Once this is done iteratively find the frequrntitemset from 1 to k and use them to generate association
rules.
Figure: Apriori algorithm
The above figure shows the Apriori Algorithm. To enhance the efficiency of the apriori algorithm we have multiple techniques like
Hash Based itemset counting, here a k itemset which has a hashing bucket count below the threshold cannot be tearmed as frequent.
Another method is the Transaction reduction, here those transaction are ignored which doesn’t contains the k itemset. The third
method is the Partitioning in which if any item is frequent in the database it should also be frequent in partitions. Another method is
sampling here the mining is done on the samples or the subsets f the data. Lastly we have Dynamic itemset counting here if all the
subset of set is frequent then only it is selected.
5. Mopnar Algorithm
MOPNAR is an extension of multi-objective evolutionary algorithm (MOEA). It helps in mining with a low computational cost a
reduced set of positive and negative QARs that are easy to understand and have good tradeoff between the number of rules, support,
and coverage of the dataset. The main focus of the algorithm is to obtain a reduced set of PNQARs which are having good tradeoff
considering three objectives which are comprehensibility, interestigness and performance. In order to perform a learning of rules it
extends the traditional MOEA model. It also introduces two new components namely EP and Restarting process.
To decompose the MOEA it decomposes the muliobjective optimization problem into N scalar optimization. It uses EA to optimize
the subproblems gathered. In this system to store all the nondominated rules found, provoke diversity in the population, and improve
the coverage of the datasets the EP and the restart is introduced. Here EP will contain all the nondominated rules found and it will also
generate the updated offspring for each solution. Since the size of EP is not fixed we can store a large number of rules and can reduce
the size of population. Whereas restarting process here deals with the local optima and provoke diversity in the population. This
process is applied when number of new individuals of the population in one generation is less than α% of the size of the current
population.
The algorithm which MOPNAR uses is as follows:
Input:
1. N population size;
2. nTrials number of evaluations;
International Conference on Computer Applications 9
Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques
for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print.
3. m number of objectives;
4. Pmut probability of mutation;
5. λ1, ...λN a set of N weight vectors;
6. T the number of weight vectors in the neighborhood of each weight vector;
7. δ the probability that parent solutions are selected from the neighborhood;
8. ηr the maximal number of solutions replaced by each child solution;
9. γ factor of amplitude for each attribute of the dataset;
10. α difference threshold.
Output: EP
6. Top K Rules Algorithm
As we have studied above all the above algorithms are depending on the threshold which leads to that the current algorithm leads to
very slow execution and it generates excess, less or no results depending on the conditions and it also sometime omits valuable
information to solve all the above disadvantages Top k rules came into the picture here the k would be the number of association rules
to be found and is set by the user. It does not follow the traditional association rules here we can use rules with a single consequent else
mining association rules from a stream instead of a transaction database.
Figure: Top K rules algorithm
Mining in this algorithm is a tedious job as the algorithm cannot rely on both threshold that is minsup and minconf here minsup is more
efficient and reliable. If the worst case scenario is present a naïve top k algorithm would generate all the rules for the basic algorithm.
The figure shows the main algorithm. The algorithm runs as follows it first scans the database once to calculate the tids for each
database item termed as c. It then generates all valid rules of size 1x1 with each having at least minsupx|T| tids the procedure save is
called next to store the rules generated. The frequent rules are added to R set. The idea is to always expand the rule having the highest
support because it is more likely to generate rules having a high support and thus to allow to raise minsup more quickly for pruning the
search space
7. Conclusion
By reviewing all the above algorithms we found that if we have to increase the mining feature and create a hybrid approach we have to
use Top K rules with the MOPNAR. As the MOPNAR has a high efficient rule mining results whereas with the help of Top K we can
increase the speed and can save the memory of the system hence we propose improving the rule accuracy using positive and negative
subgraph mining with top k-rules
8. References
1. “CHARM: An Efficient Algorithm for Closed Association Rule Mining” by Mohammed J. Zaki and Ching-Jui Hsiao
2. “Extraction and Classification of Best M Positive Negative Quantitative Association Rules” by Ms. SheetalNaredi, Mrs.
Rushali A. Deshmukh
3. “FastAlgorithms for Mining Association Rules” by RakeshAgrawal Ramakrishnan Srikant
4. “Mining Top-K Association Rules” by Philippe Fournier-Viger , Cheng-Wei Wu and Vincent S. Tseng
5. “An Efficient Mining of Sequential Rules Using Vertical Data Format” by SurbhiJigneshkumarSheth, Shailendra K Mishra
6. “Implementation of Different Data mining Algorithms with Neural Network” by Ms. Aruna J. Chamatkar
7. “Performance Analysis of Data Mining Algorithms with Neural Network” By Dr. P K Butey
8. “A New Multiobjective Evolutionary Algorithm for Mining a Reduced Set of Interesting Positive and Negative Quantitative
Association Rules” by Diana Mart´ın, Alejandro Rosete, Jesus Alcal ´ a-Fdez

More Related Content

What's hot (20)

PPTX
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom
 
PDF
A Quantified Approach for large Dataset Compression in Association Mining
IOSR Journals
 
PDF
Z36149154
IJERA Editor
 
PDF
Comparison Between WEKA and Salford System in Data Mining Software
Universitas Pembangunan Panca Budi
 
PDF
Effieient Algorithms to Find Frequent Itemset using Data Mining
IRJET Journal
 
PDF
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
PDF
IRJET- A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
IRJET Journal
 
PPT
Data mining and knowledge Discovery
Kartik Kalpande Patil
 
PDF
Enhancement techniques for data warehouse staging area
IJDKP
 
PDF
Ijcatr04051004
Editor IJCATR
 
PDF
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
PPT
3. mining frequent patterns
Azad public school
 
PPTX
What is Datamining? Which algorithms can be used for Datamining?
Seval Çapraz
 
PDF
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
PDF
Data mining
ShwetA Kumari
 
PDF
A Survey on Data Mining
IOSR Journals
 
PDF
The International Journal of Engineering and Science
theijes
 
PPTX
Data mining techniques
Hatem Magdy
 
PDF
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUES
Journal For Research
 
PDF
Ijsrdv1 i2039
ijsrd.com
 
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom
 
A Quantified Approach for large Dataset Compression in Association Mining
IOSR Journals
 
Z36149154
IJERA Editor
 
Comparison Between WEKA and Salford System in Data Mining Software
Universitas Pembangunan Panca Budi
 
Effieient Algorithms to Find Frequent Itemset using Data Mining
IRJET Journal
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
IRJET- A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
IRJET Journal
 
Data mining and knowledge Discovery
Kartik Kalpande Patil
 
Enhancement techniques for data warehouse staging area
IJDKP
 
Ijcatr04051004
Editor IJCATR
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
3. mining frequent patterns
Azad public school
 
What is Datamining? Which algorithms can be used for Datamining?
Seval Çapraz
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
Data mining
ShwetA Kumari
 
A Survey on Data Mining
IOSR Journals
 
The International Journal of Engineering and Science
theijes
 
Data mining techniques
Hatem Magdy
 
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUES
Journal For Research
 
Ijsrdv1 i2039
ijsrd.com
 

Similar to Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Applications (20)

PDF
Data Mining Concepts - A survey paper
rahulmonikasharma
 
PDF
An improved apriori algorithm for association rules
ijnlc
 
PDF
Adaptive and Fast Predictions by Minimal Itemsets Creation
IJERA Editor
 
PDF
The Transpose Technique On Number Of Transactions Of...
Amanda Brady
 
PPTX
Lasso Regression regression amalysis.pptx
ashdgeek312001
 
PDF
A Performance Based Transposition algorithm for Frequent Itemsets Generation
Waqas Tariq
 
PDF
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
ShivarkarSandip
 
PDF
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET Journal
 
PPTX
Week-1-Introduction to Data Mining.pptx
Take1As
 
PDF
Ec3212561262
IJMER
 
PDF
Data Mining based on Hashing Technique
ijtsrd
 
PDF
The Survey of Data Mining Applications And Feature Scope
IJCSEIT Journal
 
PPT
Data Mining Techniques
Houw Liong The
 
PPT
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
PPT
Associations.ppt
Quyn590023
 
PDF
6 module 4
tafosepsdfasg
 
PPTX
Data mining approaches and methods
sonangrai
 
PDF
Comparative study of frequent item set in data mining
ijpla
 
Data Mining Concepts - A survey paper
rahulmonikasharma
 
An improved apriori algorithm for association rules
ijnlc
 
Adaptive and Fast Predictions by Minimal Itemsets Creation
IJERA Editor
 
The Transpose Technique On Number Of Transactions Of...
Amanda Brady
 
Lasso Regression regression amalysis.pptx
ashdgeek312001
 
A Performance Based Transposition algorithm for Frequent Itemsets Generation
Waqas Tariq
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
ShivarkarSandip
 
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET Journal
 
Week-1-Introduction to Data Mining.pptx
Take1As
 
Ec3212561262
IJMER
 
Data Mining based on Hashing Technique
ijtsrd
 
The Survey of Data Mining Applications And Feature Scope
IJCSEIT Journal
 
Data Mining Techniques
Houw Liong The
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Associations.ppt
Quyn590023
 
6 module 4
tafosepsdfasg
 
Data mining approaches and methods
sonangrai
 
Comparative study of frequent item set in data mining
ijpla
 
Ad

More from Association of Scientists, Developers and Faculties (20)

PDF
Core conferences bta 19 paper 12
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 10
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 8
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 7
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 6
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 5
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 4
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 3
Association of Scientists, Developers and Faculties
 
PDF
Core conferences bta 19 paper 2
Association of Scientists, Developers and Faculties
 
PDF
International Conference on Cloud of Things and Wearable Technologies 2018
Association of Scientists, Developers and Faculties
 
PDF
A Typical Sleep Scheduling Algorithm in Cluster Head Selection for Energy Eff...
Association of Scientists, Developers and Faculties
 
PDF
Application of Agricultural Waste in Preparation of Sustainable Construction ...
Association of Scientists, Developers and Faculties
 
PDF
Survey and Research Challenges in Big Data
Association of Scientists, Developers and Faculties
 
PDF
Asynchronous Power Management Using Grid Deployment Method for Wireless Senso...
Association of Scientists, Developers and Faculties
 
Core conferences bta 19 paper 12
Association of Scientists, Developers and Faculties
 
Core conferences bta 19 paper 10
Association of Scientists, Developers and Faculties
 
International Conference on Cloud of Things and Wearable Technologies 2018
Association of Scientists, Developers and Faculties
 
A Typical Sleep Scheduling Algorithm in Cluster Head Selection for Energy Eff...
Association of Scientists, Developers and Faculties
 
Application of Agricultural Waste in Preparation of Sustainable Construction ...
Association of Scientists, Developers and Faculties
 
Survey and Research Challenges in Big Data
Association of Scientists, Developers and Faculties
 
Asynchronous Power Management Using Grid Deployment Method for Wireless Senso...
Association of Scientists, Developers and Faculties
 
Ad

Recently uploaded (20)

PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PPTX
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
PPTX
How to Create Odoo JS Dialog_Popup in Odoo 18
Celine George
 
PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
Dimensions of Societal Planning in Commonism
StefanMz
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
How to Create Odoo JS Dialog_Popup in Odoo 18
Celine George
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 

Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Applications

  • 1. International Conference on Computer Applications 6 Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print. International Conference on Computer Applications 2016 [ICCA 2016] ISBN 978-81-929866-5-4 VOL 05 Website icca.co.in eMail [email protected] Received 14 – March– 2016 Accepted 02 - April – 2016 Article ID ICCA002 eAID ICCA.2016.002 Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Applications Kavita Yadav1 , Pravin G Kulurkar2 1,2 Vidharbha Institute of Technology, Nagpur Abstract- In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules. Keywords: Data Mining, CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules. 1. INTRODUCTION In today’s world human beings are using multiple applications to ease their work. Every day a lot of data is generated in every field. The data can be in the form of Documents, graphical representation like picture or video and there can be multiple records also. Since there are multiple types of data there can be multiple types of format proper action should be taken for their better utilization of the available data. Since when the user wants to use the data the data can be retrieved in the proper format and information. The technique to retrieve knowledge from the data is termed as data mining or knowledge hub or simple Knowledge Discovery process (KDD).The important reason that attracted a great deal of attention in information technology the discovery of useful information from large collections of data industry towards field of “Data mining” is due to the perception of “we are data rich but information poor”. This perception is there because we have a very huge amount of data but we cannot convert it to useful information for decision making in different fields. To produce knowledge we require a lot of data and which could be in all possible formats like audio video images documents and much more. In data mining to get the full advantage not only the retrieval but also the tool for extraction of the essence of information stored, summarization of data and discovery of patterns in the data too is required for the knowledge extraction. Since there is no lack of supply of data we are having a lot of data in different formats therefore it is important to develop a system which can convert this data into knowledge to help in decision making processes. The data mining tools can help in predicting behavior and future trends which can help organizations to make future knowledge-driven decisions. The data mining tools provides various features like automated, prospective analyses can help a better decision making scenario. In this paper we would be studying various data mining techniques and will review which technique can be used in the hybrid of the data mining technique. This paper is prepared exclusively for International Conference on Computer Applications 2016 [ICCA 2016] which is published by ASDF International, Registered in London, United Kingdom under the directions of the Editor-in-Chief Dr Gunasekaran Gunasamy and Editors Dr. Daniel James, Dr. Kokula Krishna Hari Kunasekaran and Dr. Saikishore Elangovan. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honoured. For all other uses, contact the owner/author(s). Copyright Holder can be reached at [email protected] for distribution. 2016 © Reserved by Association of Scientists, Developers and Faculties [www.ASDF.international]
  • 2. International Conference on Computer Applications 7 Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print. 2. Charm Algorithm CHARM is an efficient data mining technique which is used for enumerating the set of all frequent closed data item-sets. There are multiple innovative ideas implemented in the development of charm. This technique simultaneously explores both the item-set space and transaction space over item set-tides tree that is the search space of the database. Figure 1: Charm algorithm This technique uses a highly efficient hybrid search method that skips many levels of the tree to quickly identify the frequent closed set instead of having multiple possible subset to analysis. Fast Hash-based approach is used to eliminate item set during the execution. Charm also able to utilize a novel vertical data representation called diffset for fast frequency computations. Diffsets also keep track for differences in the tids of a candidate pattern from its prefix pattern. Since diffset reduce the size of the memory required to store intermediate result, therefore the entire working in some memory even for huge database. 3. CM-Spam Algorithm In data mining getting useful patterns is a challenging task. In sequential database many techniques have been proposed for getting the patterns. A subsequence is called sequential pattern or frequent sequence if it frequently appears in a sequence database and its frequency is no less than a user-specified minimum support threshold minsup. This sequential pattern is very important in datamining as it helps in analysis of multiple applications like web medical data, program executions, click-streams, e-learning data and biological data. There are several efficient algorithm present for getting patterns amongst them the most efficient is the CM-SPAM Algorithm. Figure: CM-spam algorithm
  • 3. International Conference on Computer Applications 8 Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print. The figure above shows the CM-SPAM Algorithm here first the trained dataset with known input and output is given as the input to the system. The data is divided equally if it is necessary as it gives the sequential classification from the known input and sequential classification form the known output. The combination of both the above dataset it provides us the classification sequential dataset.Then CM SPAM algorithm is applied to get the sequential construction, parameter learning and optimized dataset from the sequential dataset. All these combinations of parameters give us the Classification model of Sequential CM SPAM which is then used on testing the dataset for the output. 4. Apriori Algorithm The apriori algorithm is one of the influential algorithm for mining for Boolean association rules. The key concept of the algorithm lies within Frequent Itemsets they are those sets of item which has minimum support and are denoted by Lifor ithItemset in the data. Secondly we have Apriori Property it is nothing butany subset of frequent itemset must be frequent and lastly we have join Operation it is required to find Lk which is a set of candidate k-itemsets is generated by joining Lk-1with itself. Here we will first find the frequent items the sets of items that have minimum support. Here a subset of a frequent itemset must also be a frequent itemset else the algorithm will not work that is if {t1 t2} is a frequent itemset both {t1} and {t2} should be a frequent itemset else the technique will fail to give a proper output. Once this is done iteratively find the frequrntitemset from 1 to k and use them to generate association rules. Figure: Apriori algorithm The above figure shows the Apriori Algorithm. To enhance the efficiency of the apriori algorithm we have multiple techniques like Hash Based itemset counting, here a k itemset which has a hashing bucket count below the threshold cannot be tearmed as frequent. Another method is the Transaction reduction, here those transaction are ignored which doesn’t contains the k itemset. The third method is the Partitioning in which if any item is frequent in the database it should also be frequent in partitions. Another method is sampling here the mining is done on the samples or the subsets f the data. Lastly we have Dynamic itemset counting here if all the subset of set is frequent then only it is selected. 5. Mopnar Algorithm MOPNAR is an extension of multi-objective evolutionary algorithm (MOEA). It helps in mining with a low computational cost a reduced set of positive and negative QARs that are easy to understand and have good tradeoff between the number of rules, support, and coverage of the dataset. The main focus of the algorithm is to obtain a reduced set of PNQARs which are having good tradeoff considering three objectives which are comprehensibility, interestigness and performance. In order to perform a learning of rules it extends the traditional MOEA model. It also introduces two new components namely EP and Restarting process. To decompose the MOEA it decomposes the muliobjective optimization problem into N scalar optimization. It uses EA to optimize the subproblems gathered. In this system to store all the nondominated rules found, provoke diversity in the population, and improve the coverage of the datasets the EP and the restart is introduced. Here EP will contain all the nondominated rules found and it will also generate the updated offspring for each solution. Since the size of EP is not fixed we can store a large number of rules and can reduce the size of population. Whereas restarting process here deals with the local optima and provoke diversity in the population. This process is applied when number of new individuals of the population in one generation is less than α% of the size of the current population. The algorithm which MOPNAR uses is as follows: Input: 1. N population size; 2. nTrials number of evaluations;
  • 4. International Conference on Computer Applications 9 Cite this article as: Kavita Yadav, Pravin, G Kulurkar. “Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Applications”. International Conference on Computer Applications 2016: 06-09. Print. 3. m number of objectives; 4. Pmut probability of mutation; 5. λ1, ...λN a set of N weight vectors; 6. T the number of weight vectors in the neighborhood of each weight vector; 7. δ the probability that parent solutions are selected from the neighborhood; 8. ηr the maximal number of solutions replaced by each child solution; 9. γ factor of amplitude for each attribute of the dataset; 10. α difference threshold. Output: EP 6. Top K Rules Algorithm As we have studied above all the above algorithms are depending on the threshold which leads to that the current algorithm leads to very slow execution and it generates excess, less or no results depending on the conditions and it also sometime omits valuable information to solve all the above disadvantages Top k rules came into the picture here the k would be the number of association rules to be found and is set by the user. It does not follow the traditional association rules here we can use rules with a single consequent else mining association rules from a stream instead of a transaction database. Figure: Top K rules algorithm Mining in this algorithm is a tedious job as the algorithm cannot rely on both threshold that is minsup and minconf here minsup is more efficient and reliable. If the worst case scenario is present a naïve top k algorithm would generate all the rules for the basic algorithm. The figure shows the main algorithm. The algorithm runs as follows it first scans the database once to calculate the tids for each database item termed as c. It then generates all valid rules of size 1x1 with each having at least minsupx|T| tids the procedure save is called next to store the rules generated. The frequent rules are added to R set. The idea is to always expand the rule having the highest support because it is more likely to generate rules having a high support and thus to allow to raise minsup more quickly for pruning the search space 7. Conclusion By reviewing all the above algorithms we found that if we have to increase the mining feature and create a hybrid approach we have to use Top K rules with the MOPNAR. As the MOPNAR has a high efficient rule mining results whereas with the help of Top K we can increase the speed and can save the memory of the system hence we propose improving the rule accuracy using positive and negative subgraph mining with top k-rules 8. References 1. “CHARM: An Efficient Algorithm for Closed Association Rule Mining” by Mohammed J. Zaki and Ching-Jui Hsiao 2. “Extraction and Classification of Best M Positive Negative Quantitative Association Rules” by Ms. SheetalNaredi, Mrs. Rushali A. Deshmukh 3. “FastAlgorithms for Mining Association Rules” by RakeshAgrawal Ramakrishnan Srikant 4. “Mining Top-K Association Rules” by Philippe Fournier-Viger , Cheng-Wei Wu and Vincent S. Tseng 5. “An Efficient Mining of Sequential Rules Using Vertical Data Format” by SurbhiJigneshkumarSheth, Shailendra K Mishra 6. “Implementation of Different Data mining Algorithms with Neural Network” by Ms. Aruna J. Chamatkar 7. “Performance Analysis of Data Mining Algorithms with Neural Network” By Dr. P K Butey 8. “A New Multiobjective Evolutionary Algorithm for Mining a Reduced Set of Interesting Positive and Negative Quantitative Association Rules” by Diana Mart´ın, Alejandro Rosete, Jesus Alcal ´ a-Fdez