International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 667
A Novel Technic to Notice Spam Reviews On e-Shopping
T.Tejaswi*1, M. Ganesh2, Sk. Naseem3, Gandharba Swain4
1,2,3,4 Department of computer science and Engineering, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, Guntur, Andhra Pradesh, India –522502
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The most common mode for consumers to
express their level of satisfaction with their purchases is
throughonline ratings, which we can refer as Online Review
System. Network analysis has recently gained a lot of
attention because of the arrival and the increasing
attractiveness of social sites, such as blogs,socialnetworking
applications, micro blogging, or customer review sites. The
reviews are used by potential customers to find opinions of
existing users before purchasing the products.Onlinereview
systems plays an important part in affecting consumers'
actions and decision making, and therefore attracting many
spammers to insert fake feedback or reviews in order to
manipulate review content and ratings. Malicious users
exploit the review website and post untrustworthy, low
quality, or sometimes fake opinions, which are referred as
Spam Reviews. In this study, we aim at providingan efficient
method to identify spam reviews and to filter out the spam
content with the dataset.
Keywords- spam; dataset; bigram; unigram;
heterogeneous
I. INTRODUCTION
ONLINE Social Media portals play an influential role in
information propagation which is considered as an
important source for producers in their advertising
campaigns as well as for customersin selectingproductsand
services. In the past years, people rely a lot on the written
reviews in their decision-making processes, and
positive/negative reviews encouraging/discouraging them
in their selection of products and services. In addition,
written reviews also help service providers to enhance the
quality of their products and services. These reviews thus
have become an important factor in success of a business
while positive reviews can bring benefits for a company,
negative reviews can potentially impactcredibilityandcause
economic losses. The fact that anyone with any identity can
leave comments as review providesa tempting.Opportunity
for spammers to write fake reviews designed to mislead
users’ opinion. These misleading reviewsarethenmultiplied
by the sharing function of social media and propagationover
the web. The reviews written to change users’ perception of
how good a product or a service are considered asspam,and
are often written in exchange for money.
Despite this great deal of efforts, many aspects have been
missed or remained unsolved. One of them is a classifierthat
can determine feature weights that show each feature’slevel
of importance in determining spam reviews. The general
concept of our proposed framework is to model a given
review dataset as a Heterogeneous Information Network
(HIN) [19] and to map the problem of spam detection into a
HIN classification problem. In particular, we model review
dataset as a HIN in which reviews are connected through
different node types (such as features and users). A
weighting algorithm is then employed to calculate each
feature’s importance (or weight). These weights are used to
calculate the final labels for reviewsusingbothunsupervised
and supervised approaches.
To evaluate the proposed solution, we used two sample
review datasets from Yelp and Amazon websites. Based on
our observations, defining two views for features (review-
user and behavioural-linguistic), the classified features as
review-behavioural have more weights and yield better
performance on spotting spam reviews in both semi-
supervised and unsupervised approaches. In addition, we
demonstrate that using different supervisions such as 1%,
2.5% and 5% or using an unsupervised approach, make no
noticeable variation on the performanceofourapproach.We
observed that feature weights can be added or removed for
labelling and hence time complexity can be scaled for a
specific level of accuracy. Asthe result of this weightingstep,
we can use fewer features with more weights to obtain
better accuracy with less time complexity. In addition,
categorizing features in four major categories (review-
behavioural, user-behavioural, review-linguistic, user-
linguistic), helps us to understand how much each category
of features is contributed to spam detection. In summary,
our main contributions are as follows:
(i) We propose Net Spam framework that is a novel
network-based approach which modelsreview networksas
heterogeneousinformation networks. Theclassificationstep
uses different meta path types which are innovative in the
spam detection domain.
(ii) A new weighting method for spam features is
proposed to determine the relative importance of each
feature and shows how effective each of features are in
identifying spams from normal reviews. Previous works
[12], [20] also aimed to address the importance of features
mainly in term of obtained accuracy, but not as a build-in
function in their framework (i.e., their approach is
dependent to ground truth for determining each feature
importance). As we explain in our unsupervised approach,
Net Spam is able to find features importance even without
ground truth, and only by relying on meta path definition
and based on values calculated for each review.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 668
(iii) Net Spam improves the accuracy compared to the
state-of-the art in terms of time complexity, which highly
depends to the number of features used to identify a spam
review; hence, using features with more weights will
resulted in detecting fake reviews easier with less time
complexity.
II. Related Work
In the past ten years, email spam detection and filtering
mechanisms have been widely implemented.Themainwork
could be summarized into two categories: the content-based
model and the identity-based model. In the first model, a
series of machine learning approaches are implemented for
content parsing according to the keywordsandpatternsthat
are spam potential. In the identity-based model, the most
commonly used approach is that each user maintains a
whitelist and a blacklist of email addresses that should and
should not be blocked by anti-spam mechanism [5,6]. More
recent work is to leverage social network into email spam
identification according to the Bayesian probability [7]. The
concept is to use social relationship between sender and
receiver to decide closeness and trust value, and then
increase or decrease Bayesian probability accordingtothese
values.
With the rapid development of social networks, social spam
has attracted a lot of attention from both industry and
academia. In industry, Facebook proposes an Edge Rank
algorithm [8] that assigns each post with a score generated
from a few feature (e.g., number of likes, number of
comments, number of reposts, etc.). Therefore, the higher
Edge Rank scores, the less possibility to be a spammer. The
disadvantage of this approach is that spammers could join
their networks and continuously like and comment each
other in order to achieve a high Edge Rank score.
In academia, Yardi et al. [9] studies the behaviour of a small
part of spammers in Twitter, and find that the behaviour of
spammers is different from legitimate users in the field of
posting tweets, followers, following friends and so on.
Stringhini et al. [10] further investigates spammer feature
via creating a number of honey-profiles in three large social
network sites (Facebook, Twitter and Myspace) and
identifies five common features (follow-to-follower, URL
ratio, message similarity, message sent, friend number, etc.)
potential for spammer detection. However, although bothof
two approaches introduce convincible framework for
spammer detection, they lack of detailed approaches
specification and prototype evaluation.
Wang [11] proposes a naïve Bayesian based spammer
classification algorithm to distinguish suspicious behaviour
from normal ones in Twitter, with the precision result (F-
measure value) of 89%. Gao et al. [12] adopts a set of novel
feature for effectively reconstructing spam messages into
campaigns rather than examining them individually (with
precision value over 80%). The disadvantage of these two
approaches is that they are not precise enough.
III. Existing Method
Different techniques have been used by researchers to find
out the spam profilesin variousOSNs. We are focussingonly
on the work that has been done to identify spammers in
Twitter asit is not only a social communicationmedia but in
fact is used to share and spread information related to
trending topics in real time. Table 1 is showing thesummary
of the papersreviewed regarding the detectionofspammers
in Twitter.
IV. Implementation and Methodology
Classification an extensive number of classification
algorithm has been connected to spam recognition region,
where support vector machine classification for its decent
generalization performance effect Furthermore,
exceptionally well known. SVM is an intense methodutilized
for data classification. Despite the fact that people consider
that it is simpler to use than Neural Networks. Each example
in the preparation set contains one class marks and a few
components. The fundamental point of SVM is to create a
model which predictsclasslabelsofinformationoccurrences
in the testing set which are given only the features. At the
show, the support vector machines have been broadly
utilized as a part of content based hostile to spam system.
SVM is a splendid solution for the little sample size issue, by
developing an isolating hyperplane to finish the
classification. As the support vector machine in spam
identification in the great execution, the paper utilizes this
algorithm to identify spam reviews.
1) Support Vector Machine A support vector machine (SVM)
can be utilized when our information hastotallytwo classes.
An SVM classifies information by finding the ideal
hyperplane that isolates all information purposes of one
class from those of alternate class. The hyperplane for an
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 669
SVM implies the one with the biggest margin between the
two classes. Margin suggests the maximal width of the
segment parallel to the hyperplane that has no interior
information points.
2) Properties of SVM Support Vector Machine has a place
with a group of generalized linear classifiers and it can be
deciphered as an extension of the perception. A unique
ability is that they all the while limit the exact classification
error and maximize the geometric margin; henceforth they
are otherwise called maximum margin classifiers.
In this section, we will discusstheproposedmethodologyfor
email spam detection technique.
A. Pre-processing
The pre-processing step is used to remove the noises from
the email which are irrelevant and need not be present. The
pre-processing step includes.
• Removal of Numbers
• Removal of Special Symbol
• Removal of URLS
• Stripping HTML
• Word Stemming
B. Feature Extraction
Feature Extraction is used to extract the important and
relevant features from the email body. The feature
transforms the email into 2 D vector space having features
number. These features are mapped fromthevocabularylist.
V. Conclusion
The Spam is a standout amongst the most irritating and
malicious increments to worldwide PC world. In this paper,
we propose a novel method for email spam detection which
can effectively identify the spam emails from its contents.
The spam emails can be blocked by the user and genuine
review can be retained by the user. The proposed classifier
achieves 98 % accuracy while classifying the series of
datasets.’
REFERENCES
[1] J. Donfro. A Whopping 20% of Yelp Reviews are Fake,
accessed on Jul. 30, 2015. [Online]. Available:
http://www.businessinsider.com/20-percent-of-yelp-
reviews-fake-2013-9
[2] M. Ott, C. Cardie, and J. T. Hancock, “Estimating the
prevalence of deception in online review communities,”
inProc. ACM WWW, 2012, pp. 201–210.
[3] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding
deceptive opinion spam by any stretch of the imagination,”
inProc. ACL, 2011, pp. 309–319.
[4] C. Xu and J. Zhang, “Combating product review spam
campaigns via multiple heterogeneous pairwise features,”
inProc. SIAM Int. Conf. Data Mining, 2014, pp. 172–180.
[5] N. Jindal and B. Liu, “Opinion spam and analysis,” in Proc.
WSDM, 2008, pp. 219–230.
[6] F. H. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to
identify review spam,” inProc. 22nd Int. Joint Conf. Artif.
Intell. (IJCAI), 2011, pp. 1–6.
[7] G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R.
Ghosh, “Exploiting burstiness in reviews for review
spammer detection,” in Proc. ICWSM, 2013, pp. 1–10.
[8] A. J. Minnich, N. Chavoshi, A. Mueen, S. Luan, and M.
Faloutsos, “Trueview: Harnessing the power of multiple
review sites,” in Proc. ACM WWW, 2015, pp. 787–797.
[9] B. Viswanath et al., “Towards detecting anomalous user
behavior in online social networks,” inProc. USENIX, 2014,
pp. 1–16.
[10] H. Li, Z. Chen, B. Liu, X. Wei, and J. Shao, “Spotting fake
reviews via collective positive-unlabeled learning,” inProc.
ICDM, Dec. 2014, pp. 899–904.
[11] L. Akoglu, R. Chandy, and C. Faloutsos, “Opinion fraud
detection in online reviews by network effects,” inProc.
ICWSM, 2013, pp. 1–10.
[12] S. Rayana and L. Akoglu, “Collective opinion spam
detection: Bridging review networksand metadata,”inProc.
ACM KDD, 2015, pp. 1–10.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 670
[13] S. Feng, R. Banerjee, and Y. Choi, “Syntactic stylometry
for decep-tion detection,” in Proc. 50th Annu. MeetingAssoc.
Comput. Linguis-tics (ACL), 2012, pp. 1–5.
[14] N. Jindal, B. Liu, and E.-P. Lim, “Finding unusual review
patterns using unexpected rules,” inProc. ACM CIKM, 2012,
pp. 1–4.
[15] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw,
“Detecting product review spammers using rating
behaviors,” inProc. ACM CIKM, 2010, pp. 1–10.
[16] A. Mukherjee et al., “Spotting opinion spammers using
behavioural footprints,” in Proc. ACM KDD, 2013, pp. 1–9.
[17] S. Xie, G. Wang, S. Lin, and P. S. Yu, “Review spam
detection via temporal pattern discovery,”inProc.ACMKDD,
2012, pp. 823–831.

More Related Content

PDF
Survey in Online Social Media Skelton by Network based Spam
PDF
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
PDF
IRJET- Detection of Ranking Fraud in Mobile Applications
PDF
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
PDF
IRJET- Analysis of Rating Difference and User Interest
PDF
A COMPARATIVE ANALYSIS OF DIFFERENT FEATURE SET ON THE PERFORMANCE OF DIFFERE...
PDF
IJSRED-V2I2P09
DOCX
Towards effective bug triage with software
Survey in Online Social Media Skelton by Network based Spam
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
IRJET- Detection of Ranking Fraud in Mobile Applications
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
IRJET- Analysis of Rating Difference and User Interest
A COMPARATIVE ANALYSIS OF DIFFERENT FEATURE SET ON THE PERFORMANCE OF DIFFERE...
IJSRED-V2I2P09
Towards effective bug triage with software

What's hot (17)

PDF
IRJET- Quantify Mutually Dependent Privacy Risks with Locality Data
PDF
A multi layer architecture for spam-detection system
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
PDF
IRJET-Fake Product Review Monitoring
PDF
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
PPTX
KnowMe and ShareMe: Understanding Automatically Discovered Personality Trai...
DOCX
Entity linking with a knowledge baseissues, techniques, and solutions
PPTX
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DOC
BW article on professional respondents 2-23 (1)
DOCX
TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES
PPTX
PDF
IRJET- Opinion Mining and Sentiment Analysis for Online Review
PPT
Social media recommendation based on people and tags (final)
PDF
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
PDF
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
PDF
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
PDF
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
IRJET- Quantify Mutually Dependent Privacy Risks with Locality Data
A multi layer architecture for spam-detection system
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IRJET-Fake Product Review Monitoring
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
KnowMe and ShareMe: Understanding Automatically Discovered Personality Trai...
Entity linking with a knowledge baseissues, techniques, and solutions
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
BW article on professional respondents 2-23 (1)
TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES
IRJET- Opinion Mining and Sentiment Analysis for Online Review
Social media recommendation based on people and tags (final)
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
Ad

Similar to IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping (20)

PDF
Recommender System- Analyzing products by mining Data Streams
PDF
Online Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
PDF
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
PDF
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
DOCX
VTU final year project report Main
PDF
IRJET- Predicting Review Ratings for Product Marketing
PDF
IRJET- A New Approach to Product Recommendation Systems
PDF
IRJET- A New Approach to Product Recommendation Systems
PDF
IRJET- Enhancing NLP Techniques for Fake Review Detection
PDF
IRJET - Online Product Scoring based on Sentiment based Review Analysis
PDF
IRJET- Customer Feedback Analysis using Machine Learning
PDF
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
PDF
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
PDF
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
PDF
IRJET- Survey Paper on Recommendation Systems
PDF
A survey on various reputation assessment techniques
PDF
IRJET - Discovery of Ranking Fraud for Mobile Apps
PDF
Service Rating Prediction by check-in and check-out behavior of user and POI
PDF
IRJET- Hybrid Recommendation System for Movies
PDF
Personalized recommendation for cold start users
Recommender System- Analyzing products by mining Data Streams
Online Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
VTU final year project report Main
IRJET- Predicting Review Ratings for Product Marketing
IRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET - Online Product Scoring based on Sentiment based Review Analysis
IRJET- Customer Feedback Analysis using Machine Learning
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
IRJET- Survey Paper on Recommendation Systems
A survey on various reputation assessment techniques
IRJET - Discovery of Ranking Fraud for Mobile Apps
Service Rating Prediction by check-in and check-out behavior of user and POI
IRJET- Hybrid Recommendation System for Movies
Personalized recommendation for cold start users
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
25AF1191PC303 MODULE-1 CHAIN SURVEYING SEMESTER III SURVEYING
PDF
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
PPTX
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
PPTX
1. Effective HSEW Induction Training - EMCO 2024, O&M.pptx
PPTX
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
PDF
Performance, energy consumption and costs: a comparative analysis of automati...
PPTX
Research Writing, Mechanical Engineering
PDF
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
PPTX
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
PDF
Artificial Intelligence_ Basics .Artificial Intelligence_ Basics .
PPTX
Real Estate Management PART 1.pptxFFFFFFFFFFFFF
PDF
BBC NW_Tech Facilities_30 Odd Yrs Ago [J].pdf
PPTX
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
PPTX
Solar energy pdf of gitam songa hemant k
PPTX
Unit IImachinemachinetoolopeartions.pptx
PDF
IAE-V2500 Engine for Airbus Family 319/320
PDF
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
PPTX
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
PPT
Unit - I.lathemachnespct=ificationsand ppt
PPTX
Unit IILATHEACCESSORSANDATTACHMENTS.pptx
25AF1191PC303 MODULE-1 CHAIN SURVEYING SEMESTER III SURVEYING
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
1. Effective HSEW Induction Training - EMCO 2024, O&M.pptx
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
Performance, energy consumption and costs: a comparative analysis of automati...
Research Writing, Mechanical Engineering
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
Artificial Intelligence_ Basics .Artificial Intelligence_ Basics .
Real Estate Management PART 1.pptxFFFFFFFFFFFFF
BBC NW_Tech Facilities_30 Odd Yrs Ago [J].pdf
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
Solar energy pdf of gitam songa hemant k
Unit IImachinemachinetoolopeartions.pptx
IAE-V2500 Engine for Airbus Family 319/320
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
Unit - I.lathemachnespct=ificationsand ppt
Unit IILATHEACCESSORSANDATTACHMENTS.pptx

IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 667 A Novel Technic to Notice Spam Reviews On e-Shopping T.Tejaswi*1, M. Ganesh2, Sk. Naseem3, Gandharba Swain4 1,2,3,4 Department of computer science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India –522502 ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The most common mode for consumers to express their level of satisfaction with their purchases is throughonline ratings, which we can refer as Online Review System. Network analysis has recently gained a lot of attention because of the arrival and the increasing attractiveness of social sites, such as blogs,socialnetworking applications, micro blogging, or customer review sites. The reviews are used by potential customers to find opinions of existing users before purchasing the products.Onlinereview systems plays an important part in affecting consumers' actions and decision making, and therefore attracting many spammers to insert fake feedback or reviews in order to manipulate review content and ratings. Malicious users exploit the review website and post untrustworthy, low quality, or sometimes fake opinions, which are referred as Spam Reviews. In this study, we aim at providingan efficient method to identify spam reviews and to filter out the spam content with the dataset. Keywords- spam; dataset; bigram; unigram; heterogeneous I. INTRODUCTION ONLINE Social Media portals play an influential role in information propagation which is considered as an important source for producers in their advertising campaigns as well as for customersin selectingproductsand services. In the past years, people rely a lot on the written reviews in their decision-making processes, and positive/negative reviews encouraging/discouraging them in their selection of products and services. In addition, written reviews also help service providers to enhance the quality of their products and services. These reviews thus have become an important factor in success of a business while positive reviews can bring benefits for a company, negative reviews can potentially impactcredibilityandcause economic losses. The fact that anyone with any identity can leave comments as review providesa tempting.Opportunity for spammers to write fake reviews designed to mislead users’ opinion. These misleading reviewsarethenmultiplied by the sharing function of social media and propagationover the web. The reviews written to change users’ perception of how good a product or a service are considered asspam,and are often written in exchange for money. Despite this great deal of efforts, many aspects have been missed or remained unsolved. One of them is a classifierthat can determine feature weights that show each feature’slevel of importance in determining spam reviews. The general concept of our proposed framework is to model a given review dataset as a Heterogeneous Information Network (HIN) [19] and to map the problem of spam detection into a HIN classification problem. In particular, we model review dataset as a HIN in which reviews are connected through different node types (such as features and users). A weighting algorithm is then employed to calculate each feature’s importance (or weight). These weights are used to calculate the final labels for reviewsusingbothunsupervised and supervised approaches. To evaluate the proposed solution, we used two sample review datasets from Yelp and Amazon websites. Based on our observations, defining two views for features (review- user and behavioural-linguistic), the classified features as review-behavioural have more weights and yield better performance on spotting spam reviews in both semi- supervised and unsupervised approaches. In addition, we demonstrate that using different supervisions such as 1%, 2.5% and 5% or using an unsupervised approach, make no noticeable variation on the performanceofourapproach.We observed that feature weights can be added or removed for labelling and hence time complexity can be scaled for a specific level of accuracy. Asthe result of this weightingstep, we can use fewer features with more weights to obtain better accuracy with less time complexity. In addition, categorizing features in four major categories (review- behavioural, user-behavioural, review-linguistic, user- linguistic), helps us to understand how much each category of features is contributed to spam detection. In summary, our main contributions are as follows: (i) We propose Net Spam framework that is a novel network-based approach which modelsreview networksas heterogeneousinformation networks. Theclassificationstep uses different meta path types which are innovative in the spam detection domain. (ii) A new weighting method for spam features is proposed to determine the relative importance of each feature and shows how effective each of features are in identifying spams from normal reviews. Previous works [12], [20] also aimed to address the importance of features mainly in term of obtained accuracy, but not as a build-in function in their framework (i.e., their approach is dependent to ground truth for determining each feature importance). As we explain in our unsupervised approach, Net Spam is able to find features importance even without ground truth, and only by relying on meta path definition and based on values calculated for each review.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 668 (iii) Net Spam improves the accuracy compared to the state-of-the art in terms of time complexity, which highly depends to the number of features used to identify a spam review; hence, using features with more weights will resulted in detecting fake reviews easier with less time complexity. II. Related Work In the past ten years, email spam detection and filtering mechanisms have been widely implemented.Themainwork could be summarized into two categories: the content-based model and the identity-based model. In the first model, a series of machine learning approaches are implemented for content parsing according to the keywordsandpatternsthat are spam potential. In the identity-based model, the most commonly used approach is that each user maintains a whitelist and a blacklist of email addresses that should and should not be blocked by anti-spam mechanism [5,6]. More recent work is to leverage social network into email spam identification according to the Bayesian probability [7]. The concept is to use social relationship between sender and receiver to decide closeness and trust value, and then increase or decrease Bayesian probability accordingtothese values. With the rapid development of social networks, social spam has attracted a lot of attention from both industry and academia. In industry, Facebook proposes an Edge Rank algorithm [8] that assigns each post with a score generated from a few feature (e.g., number of likes, number of comments, number of reposts, etc.). Therefore, the higher Edge Rank scores, the less possibility to be a spammer. The disadvantage of this approach is that spammers could join their networks and continuously like and comment each other in order to achieve a high Edge Rank score. In academia, Yardi et al. [9] studies the behaviour of a small part of spammers in Twitter, and find that the behaviour of spammers is different from legitimate users in the field of posting tweets, followers, following friends and so on. Stringhini et al. [10] further investigates spammer feature via creating a number of honey-profiles in three large social network sites (Facebook, Twitter and Myspace) and identifies five common features (follow-to-follower, URL ratio, message similarity, message sent, friend number, etc.) potential for spammer detection. However, although bothof two approaches introduce convincible framework for spammer detection, they lack of detailed approaches specification and prototype evaluation. Wang [11] proposes a naïve Bayesian based spammer classification algorithm to distinguish suspicious behaviour from normal ones in Twitter, with the precision result (F- measure value) of 89%. Gao et al. [12] adopts a set of novel feature for effectively reconstructing spam messages into campaigns rather than examining them individually (with precision value over 80%). The disadvantage of these two approaches is that they are not precise enough. III. Existing Method Different techniques have been used by researchers to find out the spam profilesin variousOSNs. We are focussingonly on the work that has been done to identify spammers in Twitter asit is not only a social communicationmedia but in fact is used to share and spread information related to trending topics in real time. Table 1 is showing thesummary of the papersreviewed regarding the detectionofspammers in Twitter. IV. Implementation and Methodology Classification an extensive number of classification algorithm has been connected to spam recognition region, where support vector machine classification for its decent generalization performance effect Furthermore, exceptionally well known. SVM is an intense methodutilized for data classification. Despite the fact that people consider that it is simpler to use than Neural Networks. Each example in the preparation set contains one class marks and a few components. The fundamental point of SVM is to create a model which predictsclasslabelsofinformationoccurrences in the testing set which are given only the features. At the show, the support vector machines have been broadly utilized as a part of content based hostile to spam system. SVM is a splendid solution for the little sample size issue, by developing an isolating hyperplane to finish the classification. As the support vector machine in spam identification in the great execution, the paper utilizes this algorithm to identify spam reviews. 1) Support Vector Machine A support vector machine (SVM) can be utilized when our information hastotallytwo classes. An SVM classifies information by finding the ideal hyperplane that isolates all information purposes of one class from those of alternate class. The hyperplane for an
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 669 SVM implies the one with the biggest margin between the two classes. Margin suggests the maximal width of the segment parallel to the hyperplane that has no interior information points. 2) Properties of SVM Support Vector Machine has a place with a group of generalized linear classifiers and it can be deciphered as an extension of the perception. A unique ability is that they all the while limit the exact classification error and maximize the geometric margin; henceforth they are otherwise called maximum margin classifiers. In this section, we will discusstheproposedmethodologyfor email spam detection technique. A. Pre-processing The pre-processing step is used to remove the noises from the email which are irrelevant and need not be present. The pre-processing step includes. • Removal of Numbers • Removal of Special Symbol • Removal of URLS • Stripping HTML • Word Stemming B. Feature Extraction Feature Extraction is used to extract the important and relevant features from the email body. The feature transforms the email into 2 D vector space having features number. These features are mapped fromthevocabularylist. V. Conclusion The Spam is a standout amongst the most irritating and malicious increments to worldwide PC world. In this paper, we propose a novel method for email spam detection which can effectively identify the spam emails from its contents. The spam emails can be blocked by the user and genuine review can be retained by the user. The proposed classifier achieves 98 % accuracy while classifying the series of datasets.’ REFERENCES [1] J. Donfro. A Whopping 20% of Yelp Reviews are Fake, accessed on Jul. 30, 2015. [Online]. Available: http://www.businessinsider.com/20-percent-of-yelp- reviews-fake-2013-9 [2] M. Ott, C. Cardie, and J. T. Hancock, “Estimating the prevalence of deception in online review communities,” inProc. ACM WWW, 2012, pp. 201–210. [3] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding deceptive opinion spam by any stretch of the imagination,” inProc. ACL, 2011, pp. 309–319. [4] C. Xu and J. Zhang, “Combating product review spam campaigns via multiple heterogeneous pairwise features,” inProc. SIAM Int. Conf. Data Mining, 2014, pp. 172–180. [5] N. Jindal and B. Liu, “Opinion spam and analysis,” in Proc. WSDM, 2008, pp. 219–230. [6] F. H. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to identify review spam,” inProc. 22nd Int. Joint Conf. Artif. Intell. (IJCAI), 2011, pp. 1–6. [7] G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh, “Exploiting burstiness in reviews for review spammer detection,” in Proc. ICWSM, 2013, pp. 1–10. [8] A. J. Minnich, N. Chavoshi, A. Mueen, S. Luan, and M. Faloutsos, “Trueview: Harnessing the power of multiple review sites,” in Proc. ACM WWW, 2015, pp. 787–797. [9] B. Viswanath et al., “Towards detecting anomalous user behavior in online social networks,” inProc. USENIX, 2014, pp. 1–16. [10] H. Li, Z. Chen, B. Liu, X. Wei, and J. Shao, “Spotting fake reviews via collective positive-unlabeled learning,” inProc. ICDM, Dec. 2014, pp. 899–904. [11] L. Akoglu, R. Chandy, and C. Faloutsos, “Opinion fraud detection in online reviews by network effects,” inProc. ICWSM, 2013, pp. 1–10. [12] S. Rayana and L. Akoglu, “Collective opinion spam detection: Bridging review networksand metadata,”inProc. ACM KDD, 2015, pp. 1–10.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 670 [13] S. Feng, R. Banerjee, and Y. Choi, “Syntactic stylometry for decep-tion detection,” in Proc. 50th Annu. MeetingAssoc. Comput. Linguis-tics (ACL), 2012, pp. 1–5. [14] N. Jindal, B. Liu, and E.-P. Lim, “Finding unusual review patterns using unexpected rules,” inProc. ACM CIKM, 2012, pp. 1–4. [15] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw, “Detecting product review spammers using rating behaviors,” inProc. ACM CIKM, 2010, pp. 1–10. [16] A. Mukherjee et al., “Spotting opinion spammers using behavioural footprints,” in Proc. ACM KDD, 2013, pp. 1–9. [17] S. Xie, G. Wang, S. Lin, and P. S. Yu, “Review spam detection via temporal pattern discovery,”inProc.ACMKDD, 2012, pp. 823–831.