CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf

CYBERBULLYING DETECTION USING
MACHINE LEARNING
PRESENTED BY GROUP I
Under the Guidance of
Ms.Surya Ashok,
HOD Computer Science department
TEAM MEMBERS:
ANITHA R
KRITHIKA V S
MEGHA M S
PRANIDHI K J

ABSTRACT
● With the widespread use of social media in this era,
cyberbullying increased rapidly as a cybercrime.
● Cyberbullying is a willful and repeated harm inﬂicted
through the use of computer, cell phones, and other electronic devices.
● The proposed system aims at detecting cyberbullying, it detects abusive
comments and messages in social media platform.
● The Machine learning algorithm,Naive bayes is used to classify comments and
messages as bullying and non-bullying.
● The project ‘Cyberbullying Detection Using Machine Learning’ discusses and
implements the approach of machine learning in order to solve the threat of
cyberbullying, and thus makes social media a safe place for the users.

SYSTEM SPECIFICATIONS
Hardware Speciﬁcation
Processor : Intel Core i5
Speed : Above 1GHz
RAM capacity : 4GB or above
Hard Disk Space Required : 5 GB or above
Keyboard : Standard Keyboard
Mouse : Standard Mouse
Monitor : Standard color monitor

Software Specification
● Language Used : Python 3.10, HTML5, JavaScript ES6
➔ Here, HTML and JavaScript are Used for designing the web application.
➔ The main advantages of using python in this project is that it is open source.
➔ It also has vast built-in machine learning libraries available.
● Web Framework : Django 3.7
➔ Django is preferred in this project because of its simplicity, flexibility, reliability and scalability.
● Database : SQL Server 2019
➔ SQL Server 2019 (15.x) introduces new ways to work with SQL Server Containers such as
Machine Learning Services.
➔ Supports Query interleaving,which is a tabular mode system configuration that can
improve user query response times in high-concurrency scenarios.

EXISTING SYSTEM
● For several years, the researchers have worked intensively on cyberbullying
detection to ﬁnd a way to control or reduce cyberbullying in Social Media
platforms.
● In a research work by Massachusetts Institute of Technology, a system to detect
cyberbullying through textual context in YouTube video comments was
developed, but the system showed less precise classiﬁcation outcome and
increased false positives.
● Generally most existing systems are focused on effects after cyberbullying
incident and there is no accurate system for online cyberbullying detection.

PROPOSED SYSTEM
● The proposed system employs machine learning to avoid human
intervention.
● A dataset containing cyberbullying and non-bullying comments is used to
train the machine learning model using the Sklearn library in Python.
● Naive Bayes algorithm is used for detecting abusive comments and
messages in social media.

● The Naive Bayes algorithm states that:
P(A/B)=(P(B/A) P(A))/P(B)
● In the proposed system automated detection of bullying comments in
social media is implemented.
● The proposed system is platform independent, it can be implemented on
any operating system and it is free to use.

MODULE DESCRIPTION
● User module.
● Admin module.
● Machine learning module.

MODULE FUNCTIONALITIES
❏ USER MODULE
● Users can sign up to the web application by registering themselves by
providing details like user name,password etc..
● Registered users can also sign in to their proﬁle by using user id and password.
● They can post videos,stories and photos in the web application.
● Users can send friend requests to other users and can also chat with their
friends.
● Users can view,like and comment the videos and photos posted by their
friends in the web application.

❏ ADMIN MODULE
● Admin can handle and make changes in the web application.
● They can also view the requests from users .
● They can also view the comments that have been classiﬁed as bullying
and non-bullying.
● They can manage the notiﬁcations of users.

❏ MACHINE LEARNING MODULE
● The Machine Learning module is responsible for classifying
comments and messages as bullying or non-bullying.
● From a vast set of comments and messages, the Naive Bayes
algorithm is used to predict bullying comments and messages.
● This module includes the following steps :
➢ Data collection
➢ Data preprocessing
➢ Segmentation
➢ Feature extraction
➢ Training
➢ Testing

FLOWCHART OF CYBERBULLYING DETECTION SYSTEM

1. DATA COLLECTION
● Collecting data for training the Machine Learning model is the basic step
in the machine learning pipeline.
● The predictions made by Machine Learning systems can only be as good as
the data on which they have been trained.
● In this system, dataset containing bullying as well as non-bullying
comments and messages.
● The data set is downloaded from KAGGLE website.
● 80% of dataset is used for training and the remaining 20% is used for
testing.

2. DATA PREPROCESSING
● Real-world raw data and images are often incomplete, inconsistent and lacking in
certain behaviors or trends. They are also likely to contain many errors. So, once
collected, they are pre-processed into a format the machine learning algorithm
can use for the model.
● Data preprocessing in Machine Learning is a crucial step that helps enhance the
quality of data to promote the extraction of meaningful insights from the data.
● The proprocessing step also includes the removal of stop words, special characters
and the conversion of uppercase letters to lowercase.
● The Lemmatization step includes converting tense word into root word. For
example, the word running is converted to its root word run.

3. SEGMENTATION
● Segmentation can be deﬁned as the process of separating sentences
into different tokens.
● N-grams are used for grouping tokens.
● N-grams are used for a variety of things. Some examples include auto
completion of sentences.
● In this project, 2-gram is used to group tokens.

4. FEATURE EXTRACTION
● Feature extraction is the process of taking out a list of words from the text data
and then transforming them into a feature set which is usable by a classiﬁer.
● In this system, TF-IDF vectorizer is used for feature extraction.
● TF-IDF stands for term frequency-inverse document frequency and it is a
measure, used to quantify the importance or relevance of string
representations in a document.
● TF-IDF associates each word in a document with a number that represents how
relevant each word is in that document.

5. TRAINING
● Model training is the key step in machine learning that results in a model ready
to be validated, tested, and deployed.
● The performance of the model determines the quality of the applications that
are built using it.
● Quality of training data and the training algorithm are both important assets
during the model training phase.
● Typically, dataset is split for training and testing.
● All these aspects of model training make it both an involved and important
process in the overall machine learning development cycle.

6. TESTING
● In machine learning, model testing is referred to as the process where
the performance of a fully trained model is evaluated on a testing set.
● The testing set consisting of a set of testing samples should be
separated from the both training and validation sets, but it should
follow the same probability distribution as the training set.
● Each testing sample has a known value of the target.

DOMAIN THEORY
➔ Machine learning
● Machine learning (ML) is the study of computer algorithms that improve
automatically through experience.
● Machine learning involves computers discovering how they can perform tasks
without being explicitly programmed to do so.
● The Machine Learning process starts with inputting training data into the
selected algorithm.
● New input data is fed into the machine learning algorithm to test whether the
algorithm works correctly.

➔ NAIVE BAYES
● A Naive Bayes classifier is a probabilistic machine learning model
that’s used for classification task.
● The classifier is based on the Bayes theorem.
Bayes Theorem :
P(A/B)=(P(B/A) P(A))/P(B)
● This system uses Multinomial Naive Bayes Classifier.
● The features/predictors used by the classifier are the frequency of
the words present in the document.

CONFUSION MATRIX
Fig : Confusion Matrix

DATA FLOW DIAGRAMS
Fig. : Level 0 DFD

CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf

CONCLUSION
The overall aim of the project “Cyberbullying Detection Using Machine
Learning” is to develop a system that automatically classiﬁes comments
and messages as bullying or non-bullying and also remove the bullying
comments from the web application.

BIBLIOGRAPHY
Referenced Sites:
1. Cynthia Van Hee, Gilles Jacobs, Chris Emmery, Bart Desmet, Els Lefever, Ben
Verhoeven, Guy De Pauw, Walter Daelemans, Véronique Hoste, Automatic
detection of cyberbullying in social media text, PloS one 13 (10), e0203794,
2018
2. Sweta Agrawal, Amit Awekar, European conference on information retrieval,
Deep learning for detecting cyberbullying across multiple social media
platforms, 141-153, 2018
3. Ong Chee Hang, Halina Mohamed Dahlan 2019 6th International Conference
on Research and Innovation in Information Systems, Cyberbullying lexicon
for social media, (ICRIIS), 1-6, 2019
4. John Hani, Mohamed Nashaat, Mostafa Ahmed, Zeyad Emad, Eslam Amer,
Ammar Mohammed, Social media cyberbullying detection using machine
learning, Int. J. Adv. Comput. Sci. Appl 10 (5), 703-707, 2019

CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf

More Related Content

What's hot (20)

Similar to CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf (20)

Recently uploaded (20)

CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf