SlideShare a Scribd company logo
Spam Mail Detection
Using Naïve Bayes Classifier
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
ADITYA INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(AUTONOMOUS)
K.KOTTURU,TEKKALI-532201
Submitted by
G . KISHORE
23A51D5802
MTECH (CSE)
Under the supervision of
Sri. T. Chalapathi Rao, M. Tech, ph. D
(Sr. Assist. Professor, Dept of CSE)
TABLE OF CONTENTS
Abstract
Introduction
Implementation
Navie Bayes Classifier
Software Requirements
Hardware Requirements
Literature review
Literature review (Table – 1)
Output
Spam. Csv data set is used
Final output
References
ABSTRACT
• In this project, we consider the main problem faced by the G-mail users caused by spam mails. In
this project we classify the mails based on their text content by using different methods, it is spam
or not spam mail.
• Spam detection means detecting spam messages or emails by understanding text content so that
you can only receive notifications about messages or emails that are very important to you.
• If spam messages are found, they are automatically transferred to a spam folder and you are never
notified of such alerts. This helps to improve the user experience, as many spam alerts can bother
many users.
• Naïve bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag
of words features to identify spam e -mail, an approach commonly used in text classification.
• OBJECTIVE:The main objective of this project is Detecting spam alerts in emails and messages,
one of the main applications like GOOGLE-MAIL that every big tech company tries to improve for
its customers, looking to build a spam detection system.
• OUTCOME: the output of this project is predict the given text message is spam or not a spam mail.
• We proposed a frame work to adequate detection of spam mails quickly and efficiently, it consist of
feature extraction CountVectorizer model and machine learning technique for natural language
processing (NLP).
• we considered a dataset from Kaggle and explore how Machine learning algorithms can be used to
find patterns in data. We applied ML Models to predict the target class and plot the sparse matrix
for all classifier models and calculate the accuracy score.
• Finally we create a web page using streamLight and python ,in the placeholder we enter the text
messages and click on the process button.After clicking process button it speaks whether the given
message is spam mail or ham mail by using artificial bias.
• It also displays the output whether it is spam or ham mail.
INTRODUCTION
• Whenever you submit details about your email or contact number on any platform, it has
become easy for those platforms to market their products by advertising them by sending
emails or by sending messages directly to your contact number.
• This results in lots of spam alerts and notifications in your inbox. This is where the task
of spam detection comes in.
• Detecting spam alerts in emails and messages is one of the main applications that every
big tech company tries to improve for its customers.
• Apple’s official messaging app and Google’s Gmail are great examples of such
applications where spam detection works well to protect users from spam alerts.
• Spam email can be dangerous.it can include malicious links can infect your computer
with malware.
• Cyber attacts also done using this spam mails, by sending trojan links .
• The presence of spam content in social media is tremendously increasing, and
therefore the detection of spam has become vital.
• The spam contents increase as people extensively use social media, i.e., Facebook,
Twitter, YouTube, and E-mail.
• The time spent by people using social media is overgrowing, especially in the time
of the pandemic.
• Users get a lot of text messages through social media, and they cannot recognize
the spam content in these messages.
• Spam messages contain malicious links, apps, fake accounts, fake news, reviews,
rumors, etc.
• To improve social media security, the detection and control of spam text are
essential. In this project we present a detailed survey on spam text detection and
classification in social media using machine learning python.
Implementation
This is the procedure that our model follows:
Dataset : Spam.csv dataset is used which is taken from kaggle.
Data Preprocessing: Data preprocessing can refer to manipulation of data before it is used .
It is divided into 4 stages :1)data cleaning 2)data integration 3)data reduction 4)data transformation .
Feature Extraction: we are using Count Vectorizer.
Count Vectorizer: It is a great tool provided by the scikit-learn library in Python. It is used to transform a
given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text.
This is helpful when we have multiple such texts, and we wish to convert each word in each text into
vectors (for using in further text analysis).
Machine learning model:Naïve Bayes Classifier
• Simple probabilistic classifier that calculates a set of probabilities by counting the frequency and combination
of values in a given dataset.
• Represent as a vector of feature values.
• Formula for calculating probabilities:
normal=p(N)*P(W1/N)*P(W2/N)..P(Wn/N)
spam=p(S)*P(W1/S)*P(W2/S)..P(Wn/S)
p(N)=probability of normal messages
p(S)=probability of spam messages
p(W1/N)=probability of word W1 in normal messages
p(W1/S)=probability of word W1 in spam messages.
• It is very useful to classify the e-mails properly.
• The precision and recall of this method is knowing to be very effective.
NAÏVE BAYES CLASSIFIER
Software requirements:
Jupyter notebook
Sublime text
Hardware requirements:
i3 processor
8gb Ram
7th
generation
Literature review
• A literature review is a survey of scholarly sources on a specific topic. It provides an overview
of current knowledge, allowing you to identify relevant theories, methods, and gaps in the
existing research.
• Writing a literature review involves finding relevant publications (such as books and journal
articles), critically analyzing them, and explaining what you found.
• There are five key steps:
1.Search for relevant literature
2.Evaluate sources
3.Identify themes, debates and gaps
4.Outline the structure
5.Write your literature review
Literature review (Table – 1)
s.no Dataset name Description Reference Web link
1 Spam Assassin 1,897 spam and 4,150
ham messages
(Méndez et al., 2006) https://spamassassin.apa
che.org/old/publiccorpu
s/
2
Princeton Spam Image
Benchmark
1,071 spam images
(Biggio et al., 2011)
https://www.cs.princeto
n.edu/cass/spam/
3 Dredze Image Spam Dataset 3,927 spam and 2,006
spam images
(Almeida &
Yamakami, 2012)
https://www.cs.jhu.edu/
~mdredze/datasets/ima
ge_spam/
4 ZH1–Chinese email spam
dataset
1,205 spam and 428
ham text emails
(
Zhang, Zhu & Yao, 2
004
)
https://archive.ics.uci.ed
u/ml/datasets/spambase
5 Enron-Spam 13,496 spam and
16,545 non spam email
text
(Koprinska
et al., 2007)
http://www2.aueb.gr/us
ers/ion/data/enron-spa
m/
OUTPUT
• Any external email can be detected and classified as spam e-mail.so
the users will be aware of such email.
• Mails are classified into spam and not spam.
• From the classified data we have calculated the accuracy as 98.29%
Spam.csv dataset is used
Final Output
References
1. A comparative performance study of feature selection methods for the a
nti-spam filtering domain
.
2. A survey and experimental evaluation of image spam filtering
techniques.
3. Advances in spam filtering techniques.
4. An evaluation of statistical spam filtering techniques.
5. Learning to classify e-mail.
• Thank you

More Related Content

PPTX
Spam Detection.pptx email spam detection ppt using naive bayes classifier
gumberarpit7
 
PPTX
Presentation2.pptx
Wanderer20
 
PDF
Detection of Spam in Emails using Machine Learning
IRJET Journal
 
PPTX
final-spam-e-mail-detection-180125111231.pptx
infotowards
 
PDF
Implementation of Spam Classifier using Naïve Bayes Algorithm
IRJET Journal
 
PPTX
Final spam-e-mail-detection
Partnered Health
 
PDF
IRJET- Suspicious Email Detection System
IRJET Journal
 
PDF
Detecting spam mail using machine learning algorithm
IRJET Journal
 
Spam Detection.pptx email spam detection ppt using naive bayes classifier
gumberarpit7
 
Presentation2.pptx
Wanderer20
 
Detection of Spam in Emails using Machine Learning
IRJET Journal
 
final-spam-e-mail-detection-180125111231.pptx
infotowards
 
Implementation of Spam Classifier using Naïve Bayes Algorithm
IRJET Journal
 
Final spam-e-mail-detection
Partnered Health
 
IRJET- Suspicious Email Detection System
IRJET Journal
 
Detecting spam mail using machine learning algorithm
IRJET Journal
 

Similar to project review using naive bayes theorem .pptx (20)

PDF
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
IRJET Journal
 
PPT
Fang feb-17
bhagirath bhatt
 
PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
PDF
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNING
azziefaazahar
 
PDF
Cross breed Spam Categorization Method using Machine Learning Techniques
IJSRED
 
PDF
Intelligent Spam Mail Detection System
IRJET Journal
 
PDF
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
eSAT Journals
 
PDF
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
ijtsrd
 
PDF
Spam Filtering
Umar Alharaky
 
PPTX
671gdhfhfghhfhfghfghfghfgh163663-Project-2-PPT.pptx
0901CS211114SOURAVDI
 
PDF
A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering
IOSR Journals
 
PPTX
finbg dlf cm DH kf ki dfbjjhfsckhvkhal review ppt.pptx
andirajukeshavakrish
 
PDF
trialFinal report7th sem.pdf
UMAPATEL34
 
PPTX
From Spam to Ham_ SMS Detection via Naïve Bayes.pptx
sangramjagtap096
 
PDF
E mail spamers ppt
aswinncs
 
PDF
E-Mail Spam Detection Using Supportive Vector Machine
IRJET Journal
 
PPTX
Spam email detection using machine learning PPT.pptx
Kunal Kalamkar
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
IRJET Journal
 
Fang feb-17
bhagirath bhatt
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNING
azziefaazahar
 
Cross breed Spam Categorization Method using Machine Learning Techniques
IJSRED
 
Intelligent Spam Mail Detection System
IRJET Journal
 
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
eSAT Journals
 
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
ijtsrd
 
Spam Filtering
Umar Alharaky
 
671gdhfhfghhfhfghfghfghfgh163663-Project-2-PPT.pptx
0901CS211114SOURAVDI
 
A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering
IOSR Journals
 
finbg dlf cm DH kf ki dfbjjhfsckhvkhal review ppt.pptx
andirajukeshavakrish
 
trialFinal report7th sem.pdf
UMAPATEL34
 
From Spam to Ham_ SMS Detection via Naïve Bayes.pptx
sangramjagtap096
 
E mail spamers ppt
aswinncs
 
E-Mail Spam Detection Using Supportive Vector Machine
IRJET Journal
 
Spam email detection using machine learning PPT.pptx
Kunal Kalamkar
 
Ad

Recently uploaded (20)

PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
Presentation about variables and constant.pptx
safalsingh810
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Ad

project review using naive bayes theorem .pptx

  • 1. Spam Mail Detection Using Naïve Bayes Classifier DEPARTMENT OF COMPUTER SCIENCE ENGINEERING ADITYA INSTITUTE OF TECHNOLOGY AND MANAGEMENT (AUTONOMOUS) K.KOTTURU,TEKKALI-532201 Submitted by G . KISHORE 23A51D5802 MTECH (CSE) Under the supervision of Sri. T. Chalapathi Rao, M. Tech, ph. D (Sr. Assist. Professor, Dept of CSE)
  • 2. TABLE OF CONTENTS Abstract Introduction Implementation Navie Bayes Classifier Software Requirements Hardware Requirements Literature review Literature review (Table – 1) Output Spam. Csv data set is used Final output References
  • 3. ABSTRACT • In this project, we consider the main problem faced by the G-mail users caused by spam mails. In this project we classify the mails based on their text content by using different methods, it is spam or not spam mail. • Spam detection means detecting spam messages or emails by understanding text content so that you can only receive notifications about messages or emails that are very important to you. • If spam messages are found, they are automatically transferred to a spam folder and you are never notified of such alerts. This helps to improve the user experience, as many spam alerts can bother many users. • Naïve bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag of words features to identify spam e -mail, an approach commonly used in text classification. • OBJECTIVE:The main objective of this project is Detecting spam alerts in emails and messages, one of the main applications like GOOGLE-MAIL that every big tech company tries to improve for its customers, looking to build a spam detection system. • OUTCOME: the output of this project is predict the given text message is spam or not a spam mail.
  • 4. • We proposed a frame work to adequate detection of spam mails quickly and efficiently, it consist of feature extraction CountVectorizer model and machine learning technique for natural language processing (NLP). • we considered a dataset from Kaggle and explore how Machine learning algorithms can be used to find patterns in data. We applied ML Models to predict the target class and plot the sparse matrix for all classifier models and calculate the accuracy score. • Finally we create a web page using streamLight and python ,in the placeholder we enter the text messages and click on the process button.After clicking process button it speaks whether the given message is spam mail or ham mail by using artificial bias. • It also displays the output whether it is spam or ham mail.
  • 5. INTRODUCTION • Whenever you submit details about your email or contact number on any platform, it has become easy for those platforms to market their products by advertising them by sending emails or by sending messages directly to your contact number. • This results in lots of spam alerts and notifications in your inbox. This is where the task of spam detection comes in. • Detecting spam alerts in emails and messages is one of the main applications that every big tech company tries to improve for its customers. • Apple’s official messaging app and Google’s Gmail are great examples of such applications where spam detection works well to protect users from spam alerts. • Spam email can be dangerous.it can include malicious links can infect your computer with malware. • Cyber attacts also done using this spam mails, by sending trojan links .
  • 6. • The presence of spam content in social media is tremendously increasing, and therefore the detection of spam has become vital. • The spam contents increase as people extensively use social media, i.e., Facebook, Twitter, YouTube, and E-mail. • The time spent by people using social media is overgrowing, especially in the time of the pandemic. • Users get a lot of text messages through social media, and they cannot recognize the spam content in these messages. • Spam messages contain malicious links, apps, fake accounts, fake news, reviews, rumors, etc. • To improve social media security, the detection and control of spam text are essential. In this project we present a detailed survey on spam text detection and classification in social media using machine learning python.
  • 7. Implementation This is the procedure that our model follows: Dataset : Spam.csv dataset is used which is taken from kaggle. Data Preprocessing: Data preprocessing can refer to manipulation of data before it is used . It is divided into 4 stages :1)data cleaning 2)data integration 3)data reduction 4)data transformation . Feature Extraction: we are using Count Vectorizer. Count Vectorizer: It is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis). Machine learning model:Naïve Bayes Classifier
  • 8. • Simple probabilistic classifier that calculates a set of probabilities by counting the frequency and combination of values in a given dataset. • Represent as a vector of feature values. • Formula for calculating probabilities: normal=p(N)*P(W1/N)*P(W2/N)..P(Wn/N) spam=p(S)*P(W1/S)*P(W2/S)..P(Wn/S) p(N)=probability of normal messages p(S)=probability of spam messages p(W1/N)=probability of word W1 in normal messages p(W1/S)=probability of word W1 in spam messages. • It is very useful to classify the e-mails properly. • The precision and recall of this method is knowing to be very effective. NAÏVE BAYES CLASSIFIER
  • 9. Software requirements: Jupyter notebook Sublime text Hardware requirements: i3 processor 8gb Ram 7th generation
  • 10. Literature review • A literature review is a survey of scholarly sources on a specific topic. It provides an overview of current knowledge, allowing you to identify relevant theories, methods, and gaps in the existing research. • Writing a literature review involves finding relevant publications (such as books and journal articles), critically analyzing them, and explaining what you found. • There are five key steps: 1.Search for relevant literature 2.Evaluate sources 3.Identify themes, debates and gaps 4.Outline the structure 5.Write your literature review
  • 11. Literature review (Table – 1) s.no Dataset name Description Reference Web link 1 Spam Assassin 1,897 spam and 4,150 ham messages (Méndez et al., 2006) https://spamassassin.apa che.org/old/publiccorpu s/ 2 Princeton Spam Image Benchmark 1,071 spam images (Biggio et al., 2011) https://www.cs.princeto n.edu/cass/spam/ 3 Dredze Image Spam Dataset 3,927 spam and 2,006 spam images (Almeida & Yamakami, 2012) https://www.cs.jhu.edu/ ~mdredze/datasets/ima ge_spam/ 4 ZH1–Chinese email spam dataset 1,205 spam and 428 ham text emails ( Zhang, Zhu & Yao, 2 004 ) https://archive.ics.uci.ed u/ml/datasets/spambase 5 Enron-Spam 13,496 spam and 16,545 non spam email text (Koprinska et al., 2007) http://www2.aueb.gr/us ers/ion/data/enron-spa m/
  • 12. OUTPUT • Any external email can be detected and classified as spam e-mail.so the users will be aware of such email. • Mails are classified into spam and not spam. • From the classified data we have calculated the accuracy as 98.29%
  • 15. References 1. A comparative performance study of feature selection methods for the a nti-spam filtering domain . 2. A survey and experimental evaluation of image spam filtering techniques. 3. Advances in spam filtering techniques. 4. An evaluation of statistical spam filtering techniques. 5. Learning to classify e-mail.