Final_Report

Anomaly detection in Software defined
networks using Reinforcement Learning
Tlhologelo Mphahlele
Supervisor: Professor T. Celik
University of the Witwatersrand
Johannesburg, South Africa
Student number: 1434978
Email: 1434978@wits.students.ac.za
Abstract—A computer network is a telecommunication
network that allows computers and computing devices to
communicate and share data. Typical computer networks
are built from hardware devices specifically designed for
networking, these are devices routers, switches, buses,
hubs and other middle ware devices. The main network
devices such as routers and switches are responsible for
receiving and forwarding data packets in a network and
to do so they implement complex algorithms using a lim-
ited tool set to ensure that each packet is forwarded to the
correct recipient. Due to this limited tool set and inflexible
nature of the devices network management and tuning
is a difficult process. Software-Defined Networking(SDN)
is a new network architecture that offers adaptability
and dynamism by decoupling the traditional network
structure and separating the hardware from software
of a network. Since this architecture is relatively new
and in development many basic functionalists found in
traditional networks are still in research and development
and optimal implementations of the functionalists are yet
to be found. This paper aims to look at using machine
learning techniques(Reinforcement learning) to make
network traffic control decisions given certain network
attack scenarios and suspiciously abnormal traffic flow
on a server running in OpenFlow network environment.
Using network statistics collected from the switch at
any one instance, the aim is to be able to identify the
traffic flowing through the network and if large network
traffic is detected appropriate actions should be taken to
determine if the traffic is anomalous and if it is, how to
stop it from affecting normal users on the network.
SDN (Software defined networks), OpenFlow, NBI
(Northbound interface).
1. INTRODUCTION
The demand put on computer networks has drastically
increased in the past few years with high volumes of
high bandwidth intensive data being shared and com-
municated between devices and networks. Even though
the network traffic has increased, the devices and
architecture which computer networks base themselves
on has not really changed to fit current needs. With this
increase of devices on computer networks traditional,
devices are expected to keep up and implement modern
complex forwarding and routing algorithms with the
limited tool set that they have and this in-turn makes
it very difficult to maintain, expand and performance
tune. The correct ways of managing and tuning net-
works is very error-prone and time consuming, often
relying on trial and error to get the network to optimal
state and once the state has been achieved, going back
and improving the network setup is usually avoided.
The idea of programmable networks has been a way
to facilitate network evolution. (1) With that came SDN
which is a new prototype for networking which sees
forwarding hardware being decoupled from control
decisions. This decoupling allows network administra-
tors to manage network services through abstraction
of lower-level functionality(1). To achieve this feat,
the control plane is decoupled from the lower-level
data plane (the plane responsible for forwarding data
packets). A third(application) plane is added onto the
two existing planes, the application layer which al-
lows software developers to utilise network resource
like they would storage and computing resource on
computers. Since Software-Defined Networking is still
a developing field in the computer network domain,
industry is developing architectures and standards that
are based on the principle of decoupling the control
plane and data plane of a network.(1) Open Flow is
a standard for SDNs that allows experimentation of
protocols. Open Flow allows for an open protocol to
program the flow-table in different switches and routers
and with the support of industry and hardware vendors
OF (Open Flow) enable switches and routers are now
commercially available. Software-Defined Networking
faces a lot of challenges because its still an emerging
field. One of the major challenges Software-Defined
networks face is the lack of basic tools and func-
tionality, security being one of these functionalities.
Security is a major part of any computer network and
Software-Defined Networks are not excluded from that.
In tuning and maintaining a network, network admin-
istrators should also be able to detect any anomalous
behaviour within the network, to do that they rely
on intrusion detection systems and firewalls. With the

capabilities that come with Software-Defined Networks
administrators can now implement their own intrusion
and anomaly detection specifically designed for their
networks. Anomaly detection can be implement on
the host side with a very negligible impact on the
performance of the network. In one instance detection
algorithms were deployed on SDNs in Small Office
and Home office networks (2) and the results proved
that effective detection can take place on the host side.
Based on work that has been done in detection of
anomalies in Software Defined Networks and results
given by (2) we set out to see if detection and mitiga-
tion could be effectively done using machine learning
techniques specifically Reinforcement learning. The
aim was to deploy a reinforcement learning algorithm
onto a simulated network and without too much over-
head see if it would be possible to learn the network
states and make decisions given the limited information
available and not put a strain on the network whilst
learning and taking decisions.
This process involved using middle-ware software to
obtain network statistics and get real time network
information whilst not adding any significant overhead
to network traffic. By doing this we were able to
get successfully deploy the algorithm and mitigate
anomalous traffic.
2. BACKGROUND AND RELATED WORK
A. SDN platforms
OpenFlow is an SDN standard that allows experimental
network protocols to be run on networks. OpenFlow
coupled together with SDN emulators and simulators
has made it easy to move from testing phase of network
protocols to implementation without much difficulty.
It exploits the flow-tables found in OpenFlow compat-
ible switches to implement firewalls, NAT (Network
Address Translation), Qos (Quality of Service) and to
collect statistics. OpenFlow provides a way to modify
the flow rules that have been added to the switches
by the controller. Different controllers make use of
different ways to manipulate the entries in the switches.
For this paper the controller that has been used is Ryu
with OpenFlow 1.3.
B. SDN Stack
The Software defined network stack is made up of three
layers: the data plane layer, control plane layer and
application plane layer. Each of these planes have inter-
faces that allow for communication between the higher
layer or lower layer. CDPI(Control to Data-Plane In-
terface) is the interface defined between the controller
plane and the datapath plane, the NBI(Northbound
Interfaces) are the interfaces between the SDN appli-
cations on the application plane and SDN controller.
C. Application Plane
SDN Applications are programs that explicitly, di-
rectly, and programmatically communicate their net-
work requirements and desired network behavior to
the SDN Controller via NBIs. In addition, they may
consume an abstracted view of the network for their in-
ternal decision making purposes. (3) SDN application
layer is where applications sit on the Open Stack. The
applications consist of one or more SDN Application
logic and one or more North Bound Interface drivers.
In this paper the application will be deployed on the
application plane, it will use NBI to communincate and
get a top-level view of the network in order to make
decisions.
D. Reinforcement learning
Reinforcement learning addresses the question of how
an autonomous agent that senses and acts in its en-
vironment can learn and choose optimal actions to
achieve its goals.(4)
For this project a particular framework of reinforce-
ment learning was looked at and used for action
taking and performing the necessary network tasks.
Markov Decision Process(MDP) is a reinforcement
learning framer work where the agent and environment
interact in a sequence of discrete time steps. On each
step/episode the agent is given a state and it has to
make a decision on the given state by looking up the
average action-value of that state.
3. METHODOLOGY
To build an anomaly detection application we first have
to define what is considered as an anomaly in the
network instances that will be run and what sort of
data will be used to define these anomalies. A dataset
for classification could be composed of multiple fields
that are within a packet as it goes through the network
but for this project an instance of the data to be
used would be composed of flow size (the number of
frames transferred between the server and the host)
and ifinpkts (the count of packets flowing through the
network at any specific point in time). Flow size is a
feature that will be made from monitoring the traffic
flow between two hosts by an external application. This
was chosen because the external application will not
add to the network load. Ifinpkts is a metric provided
by the Open vSwitch that will be used in the network.
These two features allow the reinforcement algorithm
to classify traffic and take decision on the traffic. The
decisions the algorithm has to take are:

• 0 : Do nothing and all normal forwarding of
packets to the specific port found in the packet
header to continue.
• 1 : Flag the packets coming from the specific
port/host as anomalous packets and stop forward-
ing of these packets.
A. Reinforcement learning (Markov Decision Process)
Reinforcement learning is learning what to do, how to
map situations to actions so as to maximize a numerical
reward signal.(5) The agent is not told which actions
to take but rather must discover which actions yield
the most reward by trying them.
Algorithm followed is an extract from (5) and it
involves building up a Qtable of the action value states
from running multiple episodes.
B. Algorithm
1) Qk(s, a) is the function that accepts an action and
state and returns the value of taking that action in
that state at time step k. This is fundamental to
RL. We need to know the relative values of every
state or state-action pair.
2) π is a policy, a stochastic strategy or rule to
choose action α given a state s. Think of it as
a function, π(s), that accepts state, s and returns
the action to be taken. There is a distinction
between π(s) function a specific policy π. Our
implementation of π(s) as a function is often
to just choose the action a in state s that has
the highest average return based on historical
results, argmaxQ(s,a). As we gather more data
and these average returns become more accurate,
the actual policy π may change. We may start
out with a policy of ”allow flows until traffic is
greater than 3000” but this policy may change as
we gather more data. Our implementation π(s)
function, however, is programmed by us and does
not change.
3) Episode: The full sequence of steps leading to a
terminal state and receiving a return. From the
beginning of anomalous behaviour until it has
been blocked.
4) Gt, return. The expected cumulative reward from
starting in a given state until the end of an episode.
In our implementation, we only give a reward at
the end of getting the network state after taking a
decision.
5) vπ, a function that determines the value of a state
given a policy π. We only focus on the action
values.
1) States: Reinforcement learning requires states, Re-
inforcement learning with the Markov Decision Pro-
cess requires discrete predefined states and to do that
we had to categorise and split the two features used for
detection into 3 sub parts respectively, the final states
ended up as:
• Flow value and ifinpkts were divided into these 3
groups:
– 0 : which represented small
– 1 : represented medium and
– 2 : represented large
• For flow values the three (3) categories were
defined as follows:
– Small : less than 200 frames
– Medium : less than 1000 frames
– Large : greater than 1000 frames
• Ifinpkts were split up into three(3) categories:
– Small : less than 100 bytes
– Medium : less than 300 bytes
– Large : greater than 300 bytes
Through repeated trails and network monitoring the
agreed upon value of anomalous traffic flow was a
combined value of 3000 which would be made up of
incoming packets and outgoing packets.
C. Architecture
To successfully implement anomaly detection the ap-
plication had to sit on the application layer of the SDN
stack. It was placed on top of the controller with parts
of it communicating with the controller through a Rest
API provided by the controller.
Fig. 1: Software Defined Network stack
http://tinyurl.com/z748d54
1) Tools: Getting real-time switch statistics InMon
sFlow-RT was used as it provided second by second
switch statistics and these could be used to get the
states mentioned above.

D. Network Topology
The network in which the simulations were run was
a six (6) host with one being a web server and
another being the attacker, the switch was an Open
vSwitch with sFlow enabled. The controller was a Ryu
controller running on remote host 127.0.0.1 port 6633.
Each of the different components of the network were
connected using a 10 Mbit link.
1) Ryu Controller: Ryu is a component-based soft-
ware defined networking framework. Ryu provides
software components with well defined API that make
it easy for developers to create new network man-
agement and control applications. Ryu supports var-
ious protocols for managing network devices, such as
OpenFlow, Netconf, OF-config, etc. About OpenFlow,
Ryu supports fully 1.0, 1.2, 1.3, 1.4, 1.5 and Nicira
Extensions. All of the code is freely available under
the Apache 2.0 license. (6)
Fig. 2: Network topology
E. Implementation
The application comprised on multiple individual pro-
cesses and applications that had to be brought together
to get the desired effect. Ryu offers a REST API called
ofctl rest, linking this with a simple switch application
that learns host MAC addresses, adds them to the
switch and collects host statistics was the controlling
part of the application. InMon sFlow-RT which is a
network statistics collector was deployed to collect data
from the switch, this data was then forwarded to the
application that was sitting on top of the entire stack.
The application made use of the Ryu REST API by
getting additional information on flows identified by a
custom application written for sFlow-RT, the additional
information was table id, dl src and in port using
these three fields entries in the flow table specified
by flow table id could be modified. Exact matching
entries are needed to be able to modify flows.
Once the algorithm had taken a decision the decision
was communicated to the controller by pushing JSON
data from the application to the controller through the
REST API.
FLOW MOD allows the controller to modify the state
of an OpenFlow switch; All FlowMod messages begin
with the standard OpenFlow header, containing the
appropriate version and type values, followed by the
FlowMod structure. (7) FLOW MOD messages were
used to communicate the necessary modifications to
the switch in order to halt forwarding of specific traffic.
4. RESULTS AND FINDINGS
The approach discussed in the methodology section
has so far been tested for two anomalous behaviours.
Using real time network traffic being collected from the
switch in conjunction with sFlow-RT we were able to
generate enough traffic to test and the algorithm. The
traffic which the algorithm was tested on comprised of
normal traffic flow from streaming, anomalous traffic
flow which contained a ping flood attack from any
host, SYN flood attack from an arbitrary host and the
final data was a combination of both normal traffic and
anomalous traffic.
A. Collecting data
To collect network statistics from the switch sFlow-
RT had to be used as it provided real time statistics,
by writing a small script that detected flows going
through the switch we were able to view the session
traffic between each client and the server a threshold
of 13 frames was used it the script to trigger session
observations. For flows that were larger than 13 frames
an elephant script was written to monitor the session
between the client and server. Elephant flow for the
network was defined as any flow that had more than a
thousand(1000) incoming packets per second.
B. Normal traffic
The normal traffic feed to the algorithm was generated
from simulating a normal streaming session between
the client and the server. From analysing the traffic the
observed behaviour was that with five hosts streaming
different videos from the server the average flow of
bytes in the network was less than a 1000 bytes per
second(This was the number of packets outgoing from
the server to the clients).

Fig. 3: Average normal traffic
C. Anomalous traffic
Anomalous traffic was defined as traffic that required
the server to handle more than a thousand (1000)
incoming packets at any one instance from one specific
client. The traffic was generated in two different ways,
one was using a ping flood attack and the other was
using a SYN flood attack. Both these attacks pushed the
number of packets in the network to more than 10000
in the network.
Fig. 4: SYN flood attack with command sudo hping3
-i u1 -S -p 8000 10.0.0.1
Ping flood attack does not put as much strain on the
server as the SYN flood attack but it does manage to
get the number of packets from the server to it above
10000.
Fig. 5: Ping flood attack with command sudo ping -f
10.0.0.1(the server)
D. Anomalous traffic and normal traffic
The introduction of anomalous traffic to the network
superimposed the number of outgoing packets de-
scribed in the normal and anomalous traffic section.
This resulted in having a higher number of outgoing
traffic from the server to the clients and attacking
client.
Fig. 6: Combined normal traffic and anomalous traffic
E. Findings
The implemented algorithm works well when there is
a single attacker in the network. Multiple attackers
are hard to detect as they both add to the number
of outgoing packets from the server and when the
algorithm detects this it blocks forwarding of one
attacker and it observes the current state of the network

and since the other attacker will still be present the
algorithm mistakes its correct decision of stopping
forwarding as a wrong decision. Thus, for this paper
only one attacker was introduced to the simulations.
The ping flood attack and SYN flood attack were ran
independently from each other and the results of the
performance of the algorithm was measured for both.
In both simulations with different attacks the network
conditions and variables were kept the same, we had 2
clients streaming video content from the server, client
4(10.0.0.4) and client 6(10.0.0.6).
F. Algorithm Performance
For each simulation of the network a time series
dashboard graphing the traffic flow was produced using
sFlow-RT and a custom extension to it.
Fig. 7: Combined normal traffic and anomalous traf-
fic(Ping flood attack)
Figure 7 shows anomalous traffic flow between the
server and clients, one of the clients is attacking
and another is behaving normally by streaming video
content.
Figure 8 depicts a successful detection and mitigation
of the anomalous traffic detected and depicted in figure
7. Traffic flow is back to normal ranges of under a
thousand (1000) packets outgoing from the server to
client.
1) SYN Flood attack: In the above section anomalous
traffic was generated from a ping flood attack. The
same simulation was run with a SYN flood attack.
5. DISCUSSION
The results obtained prove that it is possible to use re-
inforcement learning specifically the Markov Decision
Process to detect and successfully mitigate anomalous
Fig. 8: Anomalous traffic not being forwarded resulting
in normal network state
Fig. 9: Anomalous traffic not being forwarded anymore
traffic. Even though the obtained results are satisfactory
they do not cater for real world networks. In a normal
network setting it is expected that more than one
attacker will be taking part in generating anomalous
traffic and they might resort to offer forms of anoma-
lous behaviour such as DNS poisoning, Man in the
middle attacks and other forms of attacks that might
not necessarily introduce much extra traffic to the
network but for the two attacks in focus the reinforce-
ment learning agent performs well in the simulated
environment.

6. CONCLUSION AND FUTURE WORK
While SDN is considered the next step in computer
networking and will likely be used throughout the
industry, SDN has mainly been used for research up till
this point. Because of this very little is known about
possible attacks on SDN control plane. This paper
has looked at the possible attack scenarios in order
to define what an anomaly might be in the context of
SDN. A reinforcement learning application that uses
traffic statistics gathered from the network switch is
implemented and deployed on an experimental SDN
network. Furthermore using Ryu, the OpenFlow pro-
tocol itself has been explored. Even though Open-
Flow protocol has several message types for commu-
nication between switches and controllers, only the
FLOW MOD message was used in conjunction with
the REST API provided by Ryu to handle anomalous
traffic. In the methodology section of the paper feature
selection on the network traffic is performed, the
features that we picked were then discretized for the
machine learning algorithm (Reinforcement learning)
used in this paper. The machine learning algorithm
performed well in the two attack scenarios that we
carried out, for both SYN flood attack and Ping flood
attack the algorithm was able to learn the network
states and take appropriate actions when a state that
was outside of the normal network behaviour was
encountered.
Though the results from the algorithm are satisfactory
future work needs to be done in order to validate the
obtained results. The algorithm needs to be applied
on more complex real-world networks. The application
needs to be modified to be able to handle more than
one attacker at a time and once a flow has been stopped
it should be capable of periodically allowing the flow
to continue again to check if the client has reverted
to normal behaviour before permanently taking the
decision to block off the client from the network. The
network traffic generated for this paper was only from a
fixed set of actions which were streaming video content
and attacks from clients.
REFERENCES
[1] R. Sathya and R. Thangarajan, “Efficient anomaly
detection and mitigation in software defined net-
working environment,” in Electronics and Commu-
nication Systems (ICECS), 2015 2nd International
Conference on, Feb 2015, pp. 479–484.
[2] B. N. Astuto, M. Mendonça, X. N. Nguyen,
K. Obraczka, and T. Turletti, “A Survey of
Software-Defined Networking: Past, Present,
and Future of Programmable Networks,”
Communications Surveys and Tutorials, IEEE
Communications Society, vol. 16, no. 3,
pp. 1617 – 1634, 2014, accepted in IEEE
Communications Surveys & Tutorials. [Online].
Available: https://hal.inria.fr/hal-00825087
[3] O. N. Foundation, “Sdn architecture overview,” 12
2013.
[4] T. M. Mitchell, Machine Learning, 1st ed. New
York, NY, USA: McGraw-Hill, Inc., 1997.
[5] R. S. Sutton and A. G. Barto, Introduction to
Reinforcement Learning, 1st ed. Cambridge, MA,
USA: MIT Press, 1998.
[6] “Ryu sdn framework,” https://osrg.github.io/ryu/,
(Accessed on 11/20/2016).
[7] “Sdn / openflow / message layer
/ flowmod — flowgrammable,”
http://flowgrammable.org/sdn/openflow/message-
layer/flowmod/tabofp13, (Accessedon11/21/2016).

Final_Report

More Related Content

What's hot (19)

Similar to Final_Report (20)

Final_Report