International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1732
IDENTIFYING MALICIOUS DATA IN SOCIAL MEDIA
M.Sai Sri Lakshmi Yellari1, M.Manisha2, J.Dhanesh3 ,M.Srinivasa Rao4 ,Dr.S.Suhasini5
1Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra
Pradesh, India
2Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra
Pradesh, India
3Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra
Pradesh, India
4Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra
Pradesh, India
5Associate professor, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College,
Andhra Pradesh, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Network anomaly detection is a broad area of
research. The use of entropy and distributions of traffic
features has received a lot of attention in the research
community. Disclosing malicious traffic for network security
using entropy based approach and power law distribution is
proposed. To calculate entropy feature considered is packet
size. Malware, most commonly known as malicious data is
prevalent, arising a number of critical threat issues. With the
increasing volume of contents users share through social
media, the user is going to share large amount of information.
Using power law distribution malware is detected which the
users share in social media by making a comparison with
Shannon entropy technique.
Key Words: Socialmedia, Entropy,Malware,Powerlaw,
security, traffic.
1. INTRODUCTION
A network consists of two or more computers that are
linked in order to apportion resources (such as printers and
CDs), exchange files, or allow electronic communications.
The computers on a network may be linked through cables,
telephone lines, radio waves, satellites, or infrared light
beams. A network may be composed of any coalescence of
LANs, or WANs. Network traffic can be defined in a number
of ways. But in the simplest manner we can define it as the
density of data present in any Network. Network data
security should be a high priority when considering a
network setup due to the growing threat of hackers
endeavoring to infect as many computers possible. Due to
the cumbersomely hefty utilization of the network now a
day’s sundry attacks are been occurring and malicious data
is injected into the user’s profile or document. Due to lack of
security in the organization the data breaches. In this paper,
we study the comparison between Entropy based anomaly
detection mechanism and Power law distribution. Entropy
based anomaly detection captures more fine grained traffic
patterns as comparedtonormal volumebasedmetrics.Many
traffic features such as IP address, port number,flowsizeetc
are considered as attributes is calculating entropy where as
Power law (also called the scaling law) states that a relative
change in one quantity results in a proportional relative
change in another.
1.1 Malicious Data
Malicious data is data that, when introduced to a
computer—usuallybyanoperatorunawarethatheorshe
is doing so—will cause the computer to perform actions
undesirable to the computer's owner. Maliciouspractices
done by the local networks users that do not allow
efficient sharing of the network resources. Common
threats are: Unauthorized Access, Data Destruction,
Administrative Access, System Crash/Hardware Failure,
Virus. Malware is short for malicious software, denoting
software that can be habituated tocompromisecomputer
functions, steal data, bypass accesscontrols,orotherwise
cause harm to the host computer. Malware is a broad
term that refers to a variety of maleficent programs. This
post will define several of the most mundane types of
malware; adware, bots, bugs, rootkits, spyware, Trojan
horses, viruses, and worms.
1.2 Problem Description
Online social networks are widely use these days for the
purpose of communication. Users can share more type of
information among friends. But there exist some social
network users who misuse the features of these social
networks and promote the spreading of malicious content.
They do this by uploading the malicious files.Thesecontents
spread at a fast rate. There is no proper mechanismtodetect
these malicious files immediately and remove it effectively.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1733
Convivial network sites like Face book, Twitter,andGoogle+
are experiencing incredible magnificationinusers.There are
more than a million users as of now. Besides just
engendering a profile and linking with friends, the
gregarious networks are now buildingplatformstoruntheir
website. These platforms are built predicated based on the
profile details. These social applications are anon becoming
an example of online communication which makes
utilization of the user’s private information and activities in
convivial links for sundry accommodations. The gregarious
networks are popular denotes of communication among the
cyber world users.
2. PROPOSED SYSTEM
Malware (for "malicious software") is any program or file
that is harmful to a computer user. Thus, malware includes
computer viruses, worms, Trojan horses, and also spyware,
programming that gathers information about a computer
user without permission. It is hard to detect and distinguish
malicious packets and legitimate packets in the traffic. The
behavior of internet traffic is very far from being regular.
Malicious are abnormal traffic may look very similar to
normal traffic.
2.1 Shannon Entropy
Security breaches on a network server can result in the
disclosure of critical information or the loss of a capability
that can affect the entire organization. Therefore, securing
network servers should be a significant part of your network
andinformationsecuritystrategy[1].Manysecurityproblems
can be avoided if servers and networks are appropriately
configured. So, we approached entropy based approach to
detect anomalous traffic with altered packet size help of
packets that sent. Entropy based anomaly detection
techniques captures more fine grained traffic patterns as
compared to normal volume based metrics. To calculate
entropy we have some parameters like Source and
destination, IP address, port numbers, Packet size,
Connection time and the total number of packets flowing.
Definition 1: Entropy
Entropy is a disorder or randomness of system [3].
Algorithm:
AIM: To detect the altered packets using Entropy based
approach by making use of Shannon Entropy Algorithm.
1) Capture and add packets in the current queue L.
2) Compute the current queue length.
3) Select the desired features required for calculations
i.e. IP address of source and destination, port
number of source and destination, packet size,
packet rate and connection time[2].
4) Calculate the entropy.
𝐇(𝐗) = ∑𝐏 (𝐱𝐢) 𝐥𝐨𝐠 (𝐱𝐢)
Here, X for a fixed time window w is,
P(xi) = mi/m, Where mi is the frequency or number
of times we observe X taking the value xi as m = mi.
H (X) = - (mi /m) log (mi/m).
H (X) = Entropy
If we want to calculate probability of any source
(destination) address then,
mi = number of packets with xi as source
(Destination) address.
M = total number of packets
P (xi) =Number of packets with xi as
source/destination address/M.
Here total number of packets is the number of
packets seen for a time window T.
Similarly we can calculate probability for each
source (destination) port as P (xi)= Number of
packets with xi as source (destination) address/M
Normalized entropy calculates the over all probability
distribution in the captured flow for the time window T.
5) Normalized entropy = (𝐇/𝐥𝐨𝐠 (𝐧)) Where n is the
number of distinct xi values in the given time
window.
6) Determine the threshold on the basis of the
maximum and minimum deviations calculated for a
number of times.
7) If the result exceeds the threshold an attack is
found, else no attack is found.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1734
Fig -1: Flow diagram of Shannon Entropy
Features extracted for the detection ofanomalybasedattack
are as follows:
 Entropy of source IP address and port number.
 Entropy of destination IP address and portnumber.
 Entropy of packet type. Occurrence rate of packet
type.
 Number of packets per unit time.
 Entropy of packet size.
0
0.5
1
1.5
2
21 21 19 156158157158 32 32 14 8
ENTROPY
NO. OF APPLICATIONS INSTALLED
Chart -1: Analysis of Shannon Entropy
The above analysis is done by considering the github
facebook dataset in which we concentrated on the no.of
applications installed part to find the malicious data.
2.2 Power Law Distribution
MALWARE are malicious software programs deployed by
cyber attackers to compromise computer systems by
exploiting their security vulnerabilities. Motivated by
extraordinaryfinancial orpolitical rewards,malwareowners
are exhausting their energy to compromise as many
networked computers as they can in order to achieve their
malicious goals. A compromised computer is called a bot,
and all bots compromised by a malware form a botnet.
Botnets have become the attack engine of cyber attackers,
and they pose critical challenges to cyber defenders[4]. In
order to fight against cyber criminals, it is important for
defenders to understand malware behavior, such as
propagation or membershiprecruitment patterns,thesizeof
botnets, and distribution of bots.
The proposed two layer epidemic modeland the findings are
the first work in the field. Our contributions are summarized
as follows.
 We propose a twolayermalwarepropagationmodel
to describe the development of a given malware at
the Internet level.Compared with the existingsingle
layer epidemic models, the proposed model
represents malware propagation better in large-
scale networks.
 We find the malware distribution in terms of
networks varies fromexponentialtopowerlawwith
a short exponential tail, and to power law
distribution at its early, late, and final stage,
respectively. These findings are first theoretically
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1735
proved based on the proposed model, and then
confirmed by the experiments through the two
large-scale real-world data sets.
 Definition 1: Power Law
The power law (also called the scaling law) states that a
relative change in one quantity results in a proportional
relative change in another.
Algorithm:
AIM: To detect malicious content by making use of Power
Law Distribution.
1) Enter the Web Application.
2) Open User Registration form, enter Login
credentials.
3) If Login credentials are invalid, it displays that the
credentials entered are invalid.
4) If valid, the user can view the further steps.
5) The steps include Uploading Files, Search Users,
Search File, My SearchHistory,MaliciousDocument,
Friends Request and Response Details, My Files,My
Friend Files.
6) The Power Law Distribution is applied at My
Friends Files option.
7) Calculate the Power Law Distribution using,
a.
b. Where Y and X are variables of interest
(here X taken as Size of the file),
c. α is law’s exponent,
d. K is constant.
8) The Power Law Distribution disclosestheGoodfiles
and the Malicious files respectively.
9) It is done internally as ,if value of Y is equal to the
value of the file uploaded by the user’s friend then
the file is said to be malicious.
10) User can log out of his account after his actions are
performed.
11) The Admin who monitors overall users, blocks the
account of malicious user.
Fig -2: Flow diagram of Power Law Distribution.
Chart -2: Analysis of Power Law Distribution.
The above analysis depicts as the number of users go on
increasing the amount of malicious data also increases
resulting in poor performance.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1736
3. RESULTS
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1737
Fig -3: Admin Login.
Fig -4: User Login.
Fig -5: Upload Files.
Fig -6: Admin views all the files.
Fig -7: Admin Blocks.
4. CONCLUSIONS
The prevailing definition of network anomaly reports an
occurrence that diverges from the normal behavior.
However there are no known models available for normal
network behavior. The major strength of the new scheme is
that it can detect attack in the face book data set on the
constraint of no. of applications installed. We are not
claiming that our methods are superior to all othermethods.
It is well known that finding malicious traffic in a network or
in a communication system has a wide scope for research.
Using Entropy based technique we aim to detect the altered
packet in a network orcommunicationsystem. Experimental
sample data set which we have taken is relatively small and
hence this data-set won't cover all the attacks in the world.
So, we are moving Power LawDistributionin whichdifferent
from previous modeling methods, we propose a two layer
epidemic model: the upper layer focuses on networks of
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1738
large-scale networks, for example, domains of the Internet;
the lower layer focuses on the hosts of a given network. In
regards to future work, we will first further investigate the
dynamics of the late stage. More details of the findings are
expected to be further studied, such as the length of the
exponential tail of a power law distribution at the late stage.
Second, defenders may care more about their own network,
e.g., the distribution of a given malware at theirISPdomains,
where the conditions for the two layer model may not hold.
We need to seek appropriate models to address this
problem. Finally, we are interested in studying the
distribution of multiple malware on large-scale networks as
we only focus on one malware in this paper. We believe it is
not a simple linear relationship in the multiplemalwarecase
compared to the single malware one.
REFERENCES
[1] S.Poornimavathi,Mr.K.Anandapadmanabhan,”Entropy-
based analysis of multiple traffic anomalies Detection in
network security”, International Journal ofMultidisciplinary
Research and Development,IJMRD,ISSN:2349-5979,Vol
3,Issue 3,March,2016.
[2]Kamal Shah and TanviKapdi,“DisclosingMaliciousTraffic
for Network Security”, International Journal of Advances in
Engineering and Technology, IJAET, ISSN:22311963, Vol 7,
Issue 6, January,2015.
[3] C. G. ChakrabartiandI.Chakrabarty,”BoltzmannEntropy
: Probability And Information”, RomanianJournal ofPhysics,
P. 525-528,Volume 52, Numbers 5-6, , Bucharest, 2007.
[4] Shui Yu, Guofei Gu, Ahmed Barnawi, Song Guo and Ivan
Stojmenovic, “Malware Propagation in Large-Scale
Networks”, IEEE Transactions On Knowledge And Data
Engineering, Volume 27, January 2015.

More Related Content

PDF
IRJET- The Hidden Virus Propagation Search Engine Attack
PDF
TOOLS AND TECHNIQUES FOR NETWORK FORENSICS
PDF
Internet Worm Classification and Detection using Data Mining Techniques
PDF
Wireless lan intrusion detection by using statistical timing approach
PDF
Detecting root of the rumor in social network using GSSS
PDF
Study of Various Techniques to Filter Spam Emails
PDF
Detection of ARP Spoofing
PDF
Malware Propagation in Large-Scale Networks
IRJET- The Hidden Virus Propagation Search Engine Attack
TOOLS AND TECHNIQUES FOR NETWORK FORENSICS
Internet Worm Classification and Detection using Data Mining Techniques
Wireless lan intrusion detection by using statistical timing approach
Detecting root of the rumor in social network using GSSS
Study of Various Techniques to Filter Spam Emails
Detection of ARP Spoofing
Malware Propagation in Large-Scale Networks

What's hot (20)

PDF
IRJET- Wireless LAN Intrusion Detection and Prevention System for Malicious A...
PDF
Copyright Protection in Peer To Peer Network
PDF
Enhancement in network security with security
PDF
Enhancement in network security with security protocols
PDF
Machine Learning Techniques Used for the Detection and Analysis of Modern Typ...
PDF
Optimal remote access trojans detection based on network behavior
PDF
LATTICE STRUCTURAL ANALYSIS ON SNIFFING TO DENIAL OF SERVICE ATTACKS
PDF
Secure and Reliable Data Transmission in Generalized E-Mail
PDF
Defense mechanism for d do s attack through machine learning
PDF
CONTROLLING IP FALSIFYING USING REALISTIC SIMULATION
PDF
Guarding Against Large-Scale Scrabble In Social Network
PDF
A017510102
PDF
A Deep Learning Model For Crime Surveillance In Phone Calls.
PDF
PDF
EFFICIENT DEFENSE SYSTEM FOR IP SPOOFING IN NETWORKS
PDF
A44090104
PDF
L018118083.new ramya publication (1)
PDF
COUNTERMEASURE TOOL - CARAPACE FOR NETWORK SECURITY
PDF
Layered Approach for Preprocessing of Data in Intrusion Prevention Systems
IRJET- Wireless LAN Intrusion Detection and Prevention System for Malicious A...
Copyright Protection in Peer To Peer Network
Enhancement in network security with security
Enhancement in network security with security protocols
Machine Learning Techniques Used for the Detection and Analysis of Modern Typ...
Optimal remote access trojans detection based on network behavior
LATTICE STRUCTURAL ANALYSIS ON SNIFFING TO DENIAL OF SERVICE ATTACKS
Secure and Reliable Data Transmission in Generalized E-Mail
Defense mechanism for d do s attack through machine learning
CONTROLLING IP FALSIFYING USING REALISTIC SIMULATION
Guarding Against Large-Scale Scrabble In Social Network
A017510102
A Deep Learning Model For Crime Surveillance In Phone Calls.
EFFICIENT DEFENSE SYSTEM FOR IP SPOOFING IN NETWORKS
A44090104
L018118083.new ramya publication (1)
COUNTERMEASURE TOOL - CARAPACE FOR NETWORK SECURITY
Layered Approach for Preprocessing of Data in Intrusion Prevention Systems
Ad

Similar to Identifying Malicious Data in Social Media (20)

PPTX
Anomaly Detection in Network Traffic using Machine Learning.pptx
PDF
Intrusion Detection System using AI and Machine Learning Algorithm
PDF
PDF
PPTX
Anomaly detection final
PDF
STATISTICAL QUALITY CONTROL APPROACHES TO NETWORK INTRUSION DETECTION
PDF
Algorithms for network server anomaly behavior detection without traffic cont...
PDF
Detection of suspected nodes in MANET
PPTX
Intrusion Detection with Neural Networks
PDF
Kb2417221726
PDF
Probabilistic models for anomaly detection based on usage of network traffic
PPTX
Application of Machine Learning in Cybersecurity
PDF
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
PDF
Paper(edited)
PDF
50120140501013
PDF
Ii2514901494
PDF
IRJET- Genetic Algorithm based Intrusion Detection-Survey
PDF
A PHASED APPROACH TO INTRUSION DETECTION IN NETWORK
PDF
Machine Learning for Application-Layer Intrusion Detection
PDF
Detecting Unknown Attacks Using Big Data Analysis
Anomaly Detection in Network Traffic using Machine Learning.pptx
Intrusion Detection System using AI and Machine Learning Algorithm
Anomaly detection final
STATISTICAL QUALITY CONTROL APPROACHES TO NETWORK INTRUSION DETECTION
Algorithms for network server anomaly behavior detection without traffic cont...
Detection of suspected nodes in MANET
Intrusion Detection with Neural Networks
Kb2417221726
Probabilistic models for anomaly detection based on usage of network traffic
Application of Machine Learning in Cybersecurity
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
Paper(edited)
50120140501013
Ii2514901494
IRJET- Genetic Algorithm based Intrusion Detection-Survey
A PHASED APPROACH TO INTRUSION DETECTION IN NETWORK
Machine Learning for Application-Layer Intrusion Detection
Detecting Unknown Attacks Using Big Data Analysis
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PPTX
Solar energy pdf of gitam songa hemant k
PPTX
Unit IImachinemachinetoolopeartions.pptx
DOCX
An investigation of the use of recycled crumb rubber as a partial replacement...
PPTX
BBOC407 BIOLOGY FOR ENGINEERS (CS) - MODULE 1 PART 1.pptx
PPTX
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
PPTX
Design ,Art Across Digital Realities and eXtended Reality
PPTX
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
PDF
Performance, energy consumption and costs: a comparative analysis of automati...
PPT
Programmable Logic Controller PLC and Industrial Automation
PDF
Principles of operation, construction, theory, advantages and disadvantages, ...
PDF
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
PPTX
CS6006 - CLOUD COMPUTING - Module - 1.pptx
PPTX
SC Robotics Team Safety Training Presentation
PPTX
Software-Development-Life-Cycle-SDLC.pptx
PDF
CELDAS DE COMBUSTIBLE TIPO MEMBRANA DE INTERCAMBIO PROTÓNICO.pdf
PPTX
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
PDF
electrical machines course file-anna university
PDF
Lesson 3 .pdf
PPTX
Wireless sensor networks (WSN) SRM unit 2
Beginners-Guide-to-Artificial-Intelligence.pdf
Solar energy pdf of gitam songa hemant k
Unit IImachinemachinetoolopeartions.pptx
An investigation of the use of recycled crumb rubber as a partial replacement...
BBOC407 BIOLOGY FOR ENGINEERS (CS) - MODULE 1 PART 1.pptx
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
Design ,Art Across Digital Realities and eXtended Reality
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
Performance, energy consumption and costs: a comparative analysis of automati...
Programmable Logic Controller PLC and Industrial Automation
Principles of operation, construction, theory, advantages and disadvantages, ...
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
CS6006 - CLOUD COMPUTING - Module - 1.pptx
SC Robotics Team Safety Training Presentation
Software-Development-Life-Cycle-SDLC.pptx
CELDAS DE COMBUSTIBLE TIPO MEMBRANA DE INTERCAMBIO PROTÓNICO.pdf
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
electrical machines course file-anna university
Lesson 3 .pdf
Wireless sensor networks (WSN) SRM unit 2

Identifying Malicious Data in Social Media

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1732 IDENTIFYING MALICIOUS DATA IN SOCIAL MEDIA M.Sai Sri Lakshmi Yellari1, M.Manisha2, J.Dhanesh3 ,M.Srinivasa Rao4 ,Dr.S.Suhasini5 1Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra Pradesh, India 2Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra Pradesh, India 3Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra Pradesh, India 4Student, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra Pradesh, India 5Associate professor, Dept. of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra Pradesh, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Network anomaly detection is a broad area of research. The use of entropy and distributions of traffic features has received a lot of attention in the research community. Disclosing malicious traffic for network security using entropy based approach and power law distribution is proposed. To calculate entropy feature considered is packet size. Malware, most commonly known as malicious data is prevalent, arising a number of critical threat issues. With the increasing volume of contents users share through social media, the user is going to share large amount of information. Using power law distribution malware is detected which the users share in social media by making a comparison with Shannon entropy technique. Key Words: Socialmedia, Entropy,Malware,Powerlaw, security, traffic. 1. INTRODUCTION A network consists of two or more computers that are linked in order to apportion resources (such as printers and CDs), exchange files, or allow electronic communications. The computers on a network may be linked through cables, telephone lines, radio waves, satellites, or infrared light beams. A network may be composed of any coalescence of LANs, or WANs. Network traffic can be defined in a number of ways. But in the simplest manner we can define it as the density of data present in any Network. Network data security should be a high priority when considering a network setup due to the growing threat of hackers endeavoring to infect as many computers possible. Due to the cumbersomely hefty utilization of the network now a day’s sundry attacks are been occurring and malicious data is injected into the user’s profile or document. Due to lack of security in the organization the data breaches. In this paper, we study the comparison between Entropy based anomaly detection mechanism and Power law distribution. Entropy based anomaly detection captures more fine grained traffic patterns as comparedtonormal volumebasedmetrics.Many traffic features such as IP address, port number,flowsizeetc are considered as attributes is calculating entropy where as Power law (also called the scaling law) states that a relative change in one quantity results in a proportional relative change in another. 1.1 Malicious Data Malicious data is data that, when introduced to a computer—usuallybyanoperatorunawarethatheorshe is doing so—will cause the computer to perform actions undesirable to the computer's owner. Maliciouspractices done by the local networks users that do not allow efficient sharing of the network resources. Common threats are: Unauthorized Access, Data Destruction, Administrative Access, System Crash/Hardware Failure, Virus. Malware is short for malicious software, denoting software that can be habituated tocompromisecomputer functions, steal data, bypass accesscontrols,orotherwise cause harm to the host computer. Malware is a broad term that refers to a variety of maleficent programs. This post will define several of the most mundane types of malware; adware, bots, bugs, rootkits, spyware, Trojan horses, viruses, and worms. 1.2 Problem Description Online social networks are widely use these days for the purpose of communication. Users can share more type of information among friends. But there exist some social network users who misuse the features of these social networks and promote the spreading of malicious content. They do this by uploading the malicious files.Thesecontents spread at a fast rate. There is no proper mechanismtodetect these malicious files immediately and remove it effectively.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1733 Convivial network sites like Face book, Twitter,andGoogle+ are experiencing incredible magnificationinusers.There are more than a million users as of now. Besides just engendering a profile and linking with friends, the gregarious networks are now buildingplatformstoruntheir website. These platforms are built predicated based on the profile details. These social applications are anon becoming an example of online communication which makes utilization of the user’s private information and activities in convivial links for sundry accommodations. The gregarious networks are popular denotes of communication among the cyber world users. 2. PROPOSED SYSTEM Malware (for "malicious software") is any program or file that is harmful to a computer user. Thus, malware includes computer viruses, worms, Trojan horses, and also spyware, programming that gathers information about a computer user without permission. It is hard to detect and distinguish malicious packets and legitimate packets in the traffic. The behavior of internet traffic is very far from being regular. Malicious are abnormal traffic may look very similar to normal traffic. 2.1 Shannon Entropy Security breaches on a network server can result in the disclosure of critical information or the loss of a capability that can affect the entire organization. Therefore, securing network servers should be a significant part of your network andinformationsecuritystrategy[1].Manysecurityproblems can be avoided if servers and networks are appropriately configured. So, we approached entropy based approach to detect anomalous traffic with altered packet size help of packets that sent. Entropy based anomaly detection techniques captures more fine grained traffic patterns as compared to normal volume based metrics. To calculate entropy we have some parameters like Source and destination, IP address, port numbers, Packet size, Connection time and the total number of packets flowing. Definition 1: Entropy Entropy is a disorder or randomness of system [3]. Algorithm: AIM: To detect the altered packets using Entropy based approach by making use of Shannon Entropy Algorithm. 1) Capture and add packets in the current queue L. 2) Compute the current queue length. 3) Select the desired features required for calculations i.e. IP address of source and destination, port number of source and destination, packet size, packet rate and connection time[2]. 4) Calculate the entropy. 𝐇(𝐗) = ∑𝐏 (𝐱𝐢) 𝐥𝐨𝐠 (𝐱𝐢) Here, X for a fixed time window w is, P(xi) = mi/m, Where mi is the frequency or number of times we observe X taking the value xi as m = mi. H (X) = - (mi /m) log (mi/m). H (X) = Entropy If we want to calculate probability of any source (destination) address then, mi = number of packets with xi as source (Destination) address. M = total number of packets P (xi) =Number of packets with xi as source/destination address/M. Here total number of packets is the number of packets seen for a time window T. Similarly we can calculate probability for each source (destination) port as P (xi)= Number of packets with xi as source (destination) address/M Normalized entropy calculates the over all probability distribution in the captured flow for the time window T. 5) Normalized entropy = (𝐇/𝐥𝐨𝐠 (𝐧)) Where n is the number of distinct xi values in the given time window. 6) Determine the threshold on the basis of the maximum and minimum deviations calculated for a number of times. 7) If the result exceeds the threshold an attack is found, else no attack is found.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1734 Fig -1: Flow diagram of Shannon Entropy Features extracted for the detection ofanomalybasedattack are as follows:  Entropy of source IP address and port number.  Entropy of destination IP address and portnumber.  Entropy of packet type. Occurrence rate of packet type.  Number of packets per unit time.  Entropy of packet size. 0 0.5 1 1.5 2 21 21 19 156158157158 32 32 14 8 ENTROPY NO. OF APPLICATIONS INSTALLED Chart -1: Analysis of Shannon Entropy The above analysis is done by considering the github facebook dataset in which we concentrated on the no.of applications installed part to find the malicious data. 2.2 Power Law Distribution MALWARE are malicious software programs deployed by cyber attackers to compromise computer systems by exploiting their security vulnerabilities. Motivated by extraordinaryfinancial orpolitical rewards,malwareowners are exhausting their energy to compromise as many networked computers as they can in order to achieve their malicious goals. A compromised computer is called a bot, and all bots compromised by a malware form a botnet. Botnets have become the attack engine of cyber attackers, and they pose critical challenges to cyber defenders[4]. In order to fight against cyber criminals, it is important for defenders to understand malware behavior, such as propagation or membershiprecruitment patterns,thesizeof botnets, and distribution of bots. The proposed two layer epidemic modeland the findings are the first work in the field. Our contributions are summarized as follows.  We propose a twolayermalwarepropagationmodel to describe the development of a given malware at the Internet level.Compared with the existingsingle layer epidemic models, the proposed model represents malware propagation better in large- scale networks.  We find the malware distribution in terms of networks varies fromexponentialtopowerlawwith a short exponential tail, and to power law distribution at its early, late, and final stage, respectively. These findings are first theoretically
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1735 proved based on the proposed model, and then confirmed by the experiments through the two large-scale real-world data sets.  Definition 1: Power Law The power law (also called the scaling law) states that a relative change in one quantity results in a proportional relative change in another. Algorithm: AIM: To detect malicious content by making use of Power Law Distribution. 1) Enter the Web Application. 2) Open User Registration form, enter Login credentials. 3) If Login credentials are invalid, it displays that the credentials entered are invalid. 4) If valid, the user can view the further steps. 5) The steps include Uploading Files, Search Users, Search File, My SearchHistory,MaliciousDocument, Friends Request and Response Details, My Files,My Friend Files. 6) The Power Law Distribution is applied at My Friends Files option. 7) Calculate the Power Law Distribution using, a. b. Where Y and X are variables of interest (here X taken as Size of the file), c. α is law’s exponent, d. K is constant. 8) The Power Law Distribution disclosestheGoodfiles and the Malicious files respectively. 9) It is done internally as ,if value of Y is equal to the value of the file uploaded by the user’s friend then the file is said to be malicious. 10) User can log out of his account after his actions are performed. 11) The Admin who monitors overall users, blocks the account of malicious user. Fig -2: Flow diagram of Power Law Distribution. Chart -2: Analysis of Power Law Distribution. The above analysis depicts as the number of users go on increasing the amount of malicious data also increases resulting in poor performance.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1736 3. RESULTS
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1737 Fig -3: Admin Login. Fig -4: User Login. Fig -5: Upload Files. Fig -6: Admin views all the files. Fig -7: Admin Blocks. 4. CONCLUSIONS The prevailing definition of network anomaly reports an occurrence that diverges from the normal behavior. However there are no known models available for normal network behavior. The major strength of the new scheme is that it can detect attack in the face book data set on the constraint of no. of applications installed. We are not claiming that our methods are superior to all othermethods. It is well known that finding malicious traffic in a network or in a communication system has a wide scope for research. Using Entropy based technique we aim to detect the altered packet in a network orcommunicationsystem. Experimental sample data set which we have taken is relatively small and hence this data-set won't cover all the attacks in the world. So, we are moving Power LawDistributionin whichdifferent from previous modeling methods, we propose a two layer epidemic model: the upper layer focuses on networks of
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1738 large-scale networks, for example, domains of the Internet; the lower layer focuses on the hosts of a given network. In regards to future work, we will first further investigate the dynamics of the late stage. More details of the findings are expected to be further studied, such as the length of the exponential tail of a power law distribution at the late stage. Second, defenders may care more about their own network, e.g., the distribution of a given malware at theirISPdomains, where the conditions for the two layer model may not hold. We need to seek appropriate models to address this problem. Finally, we are interested in studying the distribution of multiple malware on large-scale networks as we only focus on one malware in this paper. We believe it is not a simple linear relationship in the multiplemalwarecase compared to the single malware one. REFERENCES [1] S.Poornimavathi,Mr.K.Anandapadmanabhan,”Entropy- based analysis of multiple traffic anomalies Detection in network security”, International Journal ofMultidisciplinary Research and Development,IJMRD,ISSN:2349-5979,Vol 3,Issue 3,March,2016. [2]Kamal Shah and TanviKapdi,“DisclosingMaliciousTraffic for Network Security”, International Journal of Advances in Engineering and Technology, IJAET, ISSN:22311963, Vol 7, Issue 6, January,2015. [3] C. G. ChakrabartiandI.Chakrabarty,”BoltzmannEntropy : Probability And Information”, RomanianJournal ofPhysics, P. 525-528,Volume 52, Numbers 5-6, , Bucharest, 2007. [4] Shui Yu, Guofei Gu, Ahmed Barnawi, Song Guo and Ivan Stojmenovic, “Malware Propagation in Large-Scale Networks”, IEEE Transactions On Knowledge And Data Engineering, Volume 27, January 2015.