SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4609
Finding the original writer of an anonymous text using Naïve Bayes
Classifier
Noorul Amin, S V Athawale
Student, Dept. of Computer Engineering, AISSMS COE, Maharashtra, India
Professor, Dept. of Computer Engineering, AISSMS COE, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Many a time we come across some texts which
are written by some anonymous writers. Thesetextscaneither
be in the form of books, emails, messages, or blogs. We know
that every writer has a unique style of writing. Onesimpleway
to find this uniqueness is to list the words the writer has used
and then find the frequency of each word. Each writer will
have a different set of frequencies for different words. If we
match it with frequencies of the wordoftheknownwriterthen
we can predict the original writer.
Key Words: anonymous writer, frequency, writer’s
prediction
1. INTRODUCTION
Writer identification has become very difficult in this digital
world. People are making more use of the digital medium to
write. Hence it becomes more difficult to detect the writer of
the literary text if the name of the original writer is not
specified. If the text would have handwritten it would have
been easy to find the original writer. Many research hasbeen
done on handwritten text's writer identification [1] but,
printed text's writer identification requires a different
approach. Hence, we would employ some machine learning
algorithms to solve this problem. We would use the Naive
Bayes algorithm which is a supervised machine learning
algorithm.NaïveBayesissuitablefortextbasedclassification.
Hence, we will employ it to solve our current problem.
1.1 Naïve Bayes
Naive Bayes belongs to a family of supervised learning
machinealgorithms based on Bayes’theoremwiththestrong
assumption of conditional independence between everypair
of features.
 P(c|x) = posterior probability of class given
predictor.
 P(c) = prior probability of class.
 P(x|c) =probability of predictor given class.
 P(x) = prior probability of predictor.
Naïve Bayes is a linear classifier that is very efficient. Naive
Bayes is easy to build and works well with large data set.
Making theassumption that each featureisindependentmay
feel unrealistic in practice but still result calculated on the
basis of this assumption can outperform the more powerful
algorithms like SVM and DecisionTree.NaïveBayesclassifier
is generally applied for text based operations like fake news
detection [3], email spam detection [4] and authorship
attribution [2] and in forensics.
1.2 Solution
Every document or text can be represented as a bag of
words, which is a set of unordered words used in the
sentences, ignoring its position in the sentence but keeping
its frequency in the text.
Fig -1: ‘Bag of Words’ Model
In this way, we will get the word frequency table. Then we
will convert the frequency table to the likelihood table,
which will give the probability of the occurrenceofa word in
the text. From this posterior probability is calculated using
the Naive Bayesian equation for each class. The class with
the highest posterior probability will be the result of the
prediction.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4610
1.3. Pros and Cons of using Naive Bayes
Pros
 Easy and fast to predict the class of the dataset.
 Requires less training data.
 Have a very high accuracy rate.
Cons
 If categorical variable features a class, which was
not observed in the training data set,thenthemodel
will assign a 0 probability and will not be able to
make a prediction.
2. Naïve Bayes compared to SVM and DecisionTree
Algorithms
Clearly, Naïve Bayes does the job of classification very well.
But what is its performanceascomparedtoothersupervised
algorithms? When working on Enron datasets, containing
500000 emails, I measured the performance of these three
algorithms in terms of training time, prediction time and
accuracy.
Algorithms Accuracy Training
Time
Prediction
Time
Naïve
Bayes
92.0364050057% 0.2450 s 0.021 s
Linear
SVM
95.9613196815% 23.484 s 2.091 s
RBF SVM 97.6109215017% 12.982 s 0.935 s
Decision
Tree
96.7007963595% 4.715 s 0.003 s
Table -1: Performance of different algorithms
Chart -1: Accuracy of different algorithms
Chart -2: Training time of different algorithms
Chart -3: Prediction time of different algorithms
Since performance of any of these algorithms is
  Accuracy
  1/(Training Time + Prediction Time)
Hence, Performance of Algorithm,
 = (K*Accuracy)/(Training Time + Prediction Time)
If K = 1, then
 = (Accuracy)/(Training Time + Prediction Time)
Here,  is the performance factor.
Hence greater the value of  better is the algorithm.
If we plot the ρ of different algorithms thenasexpected,ρ for
Naïve Bayes is highest.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4611
Chart -3: Performance Factor of different algorithms
From all these charts we can infer that Naive Bayes may not
give the best accuracy, but it gives the lowest combinedtime
of training time and prediction time. Accuracy for this
algorithm greatly depends on the size of training data sets.
With the increase in data set size, the accuracy of this
algorithm increases such that aftersomepointit will getpast
these algorithms in both accuracy and time scores.
Hence, for very large data sets it is the best algorithm to
use considering both time and accuracy.
3. CONCLUSIONS
Hence, we can predict the original writer of an anonymous
text using Naive Bayes algorithm with high accuracy given
that we have the set of texts of that writer for reference.
ACKNOWLEDGEMENT
I would like to thank Prof. S. V. Athawale Sir, under whose
guidance I was able to complete this journal. I would also
like to thank my group mates for their moral support.
REFERENCES
[1] Stefan Fiel and Robert Sablatnig,” Writer Retrieval and
Writer Identification using Local Features”, 2012 10th
IAPR International Workshop on Document Analysis
Systems, Year: 2012, Pages: 145 – 149.
[2] Fatma Howedi and Masnizah Mohd, “Text Classification
for Authorship Attribution Using Naive Bayes Classifier
with Limited Training Data”, ComputerEngineeringand
Intelligent Systems ISSN 2222-1719(Paper)ISSN 2222-
2863 (Online)Vol.5, No.4, 2014.
[3] Mykhailo Granik and Volodymyr Mesyura, “Fake news
detection using naive Bayes classifier”, 2017 IEEE First
Ukraine Conference on Electrical and Computer
Engineering (UKRCON), Year: 2017, Pages: 900 - 903
[4] Anirudh Harisinghaney, Aman Dixit, Saurabh Gupta and
Anuja Arora, “Text and image based spam email
classification using KNN, Naïve Bayes and Reverse
DBSCAN algorithm”, 2014 International Conference on
Reliability Optimization and Information Technology
(ICROIT), Year:2014, Pages:153 – 155.
[5] 6 Easy Steps to Learn Naive Bayes Algorithm (with
codes in Python and R.). Retrieved 26 March, 2019,
from
https://www.analyticsvidhya.com/blog/2017/09/nai
ve-bayes-explained/ .
[6] Naive Bayes and Text Classification. Retrieved 26
March, 2019, from
https://sebastianraschka.com/Articles/2014_naive_b
ayes_1.html .
[7] A practical explanation of a Naive Bayes classifier.
Retrieved 26 March, 2019, from
https://monkeylearn.com/blog/practical-
explanation-naive-bayes-classifier/ .
[8] Gareth James; Daniela Witten; Trevor Hastie Robert
Tibshirani, “An Introduction to Statistical Learning with
Applications in R”, 2013.
[9] Andreas C. Mueller and Sarah Guido, “Introduction to
Machine Learning with Python”, O’reilly, 2016.
[10] Allen B. Downey, “Think Bayes Bayesian Statistics Made
Simple”, O’reilly, 2012.

More Related Content

What's hot (20)

PDF
Multivariate Data Analysis Project Report
Utkarsh Agrawal
 
PDF
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
Isabelle Augenstein
 
DOCX
college resume.hannah.new
Hannah Peeler
 
PDF
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
IJCSEA Journal
 
DOCX
Machine Learning 1
Muhammad GulRaj
 
PPTX
AI: Belief Networks
DataminingTools Inc
 
PDF
Determining the Credibility of Science Communication
Isabelle Augenstein
 
PDF
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
IJERA Editor
 
PDF
IRJET- Plant Disease Detection and Classification using Image Processing a...
IRJET Journal
 
PDF
Digital image hiding algorithm for secret communication
eSAT Journals
 
PPTX
Recommendation system using collaborative deep learning
Ritesh Sawant
 
PPTX
Master defense presentation 2019 04_18_rev2
Hyun Wong Choi
 
PDF
Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL ...
SimranKetha
 
PDF
An Adaptive Approach for Subjective Answer Evaluation
vivatechijri
 
PDF
Ie3514301434
IJERA Editor
 
PPTX
Terminology Machine Learning
DataminingTools Inc
 
PDF
Using Dempster-Shafer Theory and Real Options Theory
Eric van Heck
 
PDF
Document retrieval using clustering
eSAT Journals
 
PDF
Sybilsecure an energy efficient
ijistjournal
 
DOC
Resume
Siva Gonuguntla
 
Multivariate Data Analysis Project Report
Utkarsh Agrawal
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
Isabelle Augenstein
 
college resume.hannah.new
Hannah Peeler
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
IJCSEA Journal
 
Machine Learning 1
Muhammad GulRaj
 
AI: Belief Networks
DataminingTools Inc
 
Determining the Credibility of Science Communication
Isabelle Augenstein
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
IJERA Editor
 
IRJET- Plant Disease Detection and Classification using Image Processing a...
IRJET Journal
 
Digital image hiding algorithm for secret communication
eSAT Journals
 
Recommendation system using collaborative deep learning
Ritesh Sawant
 
Master defense presentation 2019 04_18_rev2
Hyun Wong Choi
 
Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL ...
SimranKetha
 
An Adaptive Approach for Subjective Answer Evaluation
vivatechijri
 
Ie3514301434
IJERA Editor
 
Terminology Machine Learning
DataminingTools Inc
 
Using Dempster-Shafer Theory and Real Options Theory
Eric van Heck
 
Document retrieval using clustering
eSAT Journals
 
Sybilsecure an energy efficient
ijistjournal
 

Similar to IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Classifier (20)

PDF
Sentiment Analysis using Naïve Bayes, CNN, SVM
IRJET Journal
 
PDF
Email Spam Detection Using Machine Learning
IRJET Journal
 
PDF
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET Journal
 
PDF
IRJET- Student Placement Prediction using Machine Learning
IRJET Journal
 
PDF
News article classification using Naive Bayes Algorithm
IRJET Journal
 
PDF
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET Journal
 
PDF
Implementation of Spam Classifier using Naïve Bayes Algorithm
IRJET Journal
 
PDF
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET Journal
 
PDF
Recuriter Recommendation System
IRJET Journal
 
PDF
Comparison of Text Classifiers on News Articles
IRJET Journal
 
PDF
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET Journal
 
PDF
IRJET - Automated Essay Grading System using Deep Learning
IRJET Journal
 
PDF
IRJET- Rice QA using Deep Learning
IRJET Journal
 
PDF
IRJET- Personality Recognition using Multi-Label Classification
IRJET Journal
 
PDF
Opinion mining framework using proposed RB-bayes model for text classication
IJECEIAES
 
PDF
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET Journal
 
PDF
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET Journal
 
PDF
Performance Analysis and Parallelization of CosineSimilarity of Documents
IRJET Journal
 
PDF
Hybrid-Training & Placement Management with Prediction System
IRJET Journal
 
PDF
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET Journal
 
Sentiment Analysis using Naïve Bayes, CNN, SVM
IRJET Journal
 
Email Spam Detection Using Machine Learning
IRJET Journal
 
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET Journal
 
IRJET- Student Placement Prediction using Machine Learning
IRJET Journal
 
News article classification using Naive Bayes Algorithm
IRJET Journal
 
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET Journal
 
Implementation of Spam Classifier using Naïve Bayes Algorithm
IRJET Journal
 
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET Journal
 
Recuriter Recommendation System
IRJET Journal
 
Comparison of Text Classifiers on News Articles
IRJET Journal
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET Journal
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET Journal
 
IRJET- Rice QA using Deep Learning
IRJET Journal
 
IRJET- Personality Recognition using Multi-Label Classification
IRJET Journal
 
Opinion mining framework using proposed RB-bayes model for text classication
IJECEIAES
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET Journal
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET Journal
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
IRJET Journal
 
Hybrid-Training & Placement Management with Prediction System
IRJET Journal
 
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET Journal
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 
PPTX
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
PPTX
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
PDF
Digital water marking system project report
Kamal Acharya
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PDF
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PDF
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
PPTX
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
PPTX
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
PDF
Electrical Machines and Their Protection.pdf
Nabajyoti Banik
 
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
Digital water marking system project report
Kamal Acharya
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
Electrical Machines and Their Protection.pdf
Nabajyoti Banik
 

IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Classifier

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4609 Finding the original writer of an anonymous text using Naïve Bayes Classifier Noorul Amin, S V Athawale Student, Dept. of Computer Engineering, AISSMS COE, Maharashtra, India Professor, Dept. of Computer Engineering, AISSMS COE, Maharashtra, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Many a time we come across some texts which are written by some anonymous writers. Thesetextscaneither be in the form of books, emails, messages, or blogs. We know that every writer has a unique style of writing. Onesimpleway to find this uniqueness is to list the words the writer has used and then find the frequency of each word. Each writer will have a different set of frequencies for different words. If we match it with frequencies of the wordoftheknownwriterthen we can predict the original writer. Key Words: anonymous writer, frequency, writer’s prediction 1. INTRODUCTION Writer identification has become very difficult in this digital world. People are making more use of the digital medium to write. Hence it becomes more difficult to detect the writer of the literary text if the name of the original writer is not specified. If the text would have handwritten it would have been easy to find the original writer. Many research hasbeen done on handwritten text's writer identification [1] but, printed text's writer identification requires a different approach. Hence, we would employ some machine learning algorithms to solve this problem. We would use the Naive Bayes algorithm which is a supervised machine learning algorithm.NaïveBayesissuitablefortextbasedclassification. Hence, we will employ it to solve our current problem. 1.1 Naïve Bayes Naive Bayes belongs to a family of supervised learning machinealgorithms based on Bayes’theoremwiththestrong assumption of conditional independence between everypair of features.  P(c|x) = posterior probability of class given predictor.  P(c) = prior probability of class.  P(x|c) =probability of predictor given class.  P(x) = prior probability of predictor. Naïve Bayes is a linear classifier that is very efficient. Naive Bayes is easy to build and works well with large data set. Making theassumption that each featureisindependentmay feel unrealistic in practice but still result calculated on the basis of this assumption can outperform the more powerful algorithms like SVM and DecisionTree.NaïveBayesclassifier is generally applied for text based operations like fake news detection [3], email spam detection [4] and authorship attribution [2] and in forensics. 1.2 Solution Every document or text can be represented as a bag of words, which is a set of unordered words used in the sentences, ignoring its position in the sentence but keeping its frequency in the text. Fig -1: ‘Bag of Words’ Model In this way, we will get the word frequency table. Then we will convert the frequency table to the likelihood table, which will give the probability of the occurrenceofa word in the text. From this posterior probability is calculated using the Naive Bayesian equation for each class. The class with the highest posterior probability will be the result of the prediction.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4610 1.3. Pros and Cons of using Naive Bayes Pros  Easy and fast to predict the class of the dataset.  Requires less training data.  Have a very high accuracy rate. Cons  If categorical variable features a class, which was not observed in the training data set,thenthemodel will assign a 0 probability and will not be able to make a prediction. 2. Naïve Bayes compared to SVM and DecisionTree Algorithms Clearly, Naïve Bayes does the job of classification very well. But what is its performanceascomparedtoothersupervised algorithms? When working on Enron datasets, containing 500000 emails, I measured the performance of these three algorithms in terms of training time, prediction time and accuracy. Algorithms Accuracy Training Time Prediction Time Naïve Bayes 92.0364050057% 0.2450 s 0.021 s Linear SVM 95.9613196815% 23.484 s 2.091 s RBF SVM 97.6109215017% 12.982 s 0.935 s Decision Tree 96.7007963595% 4.715 s 0.003 s Table -1: Performance of different algorithms Chart -1: Accuracy of different algorithms Chart -2: Training time of different algorithms Chart -3: Prediction time of different algorithms Since performance of any of these algorithms is   Accuracy   1/(Training Time + Prediction Time) Hence, Performance of Algorithm,  = (K*Accuracy)/(Training Time + Prediction Time) If K = 1, then  = (Accuracy)/(Training Time + Prediction Time) Here,  is the performance factor. Hence greater the value of  better is the algorithm. If we plot the ρ of different algorithms thenasexpected,ρ for Naïve Bayes is highest.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 4611 Chart -3: Performance Factor of different algorithms From all these charts we can infer that Naive Bayes may not give the best accuracy, but it gives the lowest combinedtime of training time and prediction time. Accuracy for this algorithm greatly depends on the size of training data sets. With the increase in data set size, the accuracy of this algorithm increases such that aftersomepointit will getpast these algorithms in both accuracy and time scores. Hence, for very large data sets it is the best algorithm to use considering both time and accuracy. 3. CONCLUSIONS Hence, we can predict the original writer of an anonymous text using Naive Bayes algorithm with high accuracy given that we have the set of texts of that writer for reference. ACKNOWLEDGEMENT I would like to thank Prof. S. V. Athawale Sir, under whose guidance I was able to complete this journal. I would also like to thank my group mates for their moral support. REFERENCES [1] Stefan Fiel and Robert Sablatnig,” Writer Retrieval and Writer Identification using Local Features”, 2012 10th IAPR International Workshop on Document Analysis Systems, Year: 2012, Pages: 145 – 149. [2] Fatma Howedi and Masnizah Mohd, “Text Classification for Authorship Attribution Using Naive Bayes Classifier with Limited Training Data”, ComputerEngineeringand Intelligent Systems ISSN 2222-1719(Paper)ISSN 2222- 2863 (Online)Vol.5, No.4, 2014. [3] Mykhailo Granik and Volodymyr Mesyura, “Fake news detection using naive Bayes classifier”, 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), Year: 2017, Pages: 900 - 903 [4] Anirudh Harisinghaney, Aman Dixit, Saurabh Gupta and Anuja Arora, “Text and image based spam email classification using KNN, Naïve Bayes and Reverse DBSCAN algorithm”, 2014 International Conference on Reliability Optimization and Information Technology (ICROIT), Year:2014, Pages:153 – 155. [5] 6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R.). Retrieved 26 March, 2019, from https://www.analyticsvidhya.com/blog/2017/09/nai ve-bayes-explained/ . [6] Naive Bayes and Text Classification. Retrieved 26 March, 2019, from https://sebastianraschka.com/Articles/2014_naive_b ayes_1.html . [7] A practical explanation of a Naive Bayes classifier. Retrieved 26 March, 2019, from https://monkeylearn.com/blog/practical- explanation-naive-bayes-classifier/ . [8] Gareth James; Daniela Witten; Trevor Hastie Robert Tibshirani, “An Introduction to Statistical Learning with Applications in R”, 2013. [9] Andreas C. Mueller and Sarah Guido, “Introduction to Machine Learning with Python”, O’reilly, 2016. [10] Allen B. Downey, “Think Bayes Bayesian Statistics Made Simple”, O’reilly, 2012.