Performing Network & Security Analytics
with Hadoop
Travis Dawson
Director of Product Management Narus, Inc
Hadoop Summit 2012
Agenda

 Who am I, What do I do
 What is Network & Security Analytics

 Using Hadoop in Network & Security Analytics
 What becomes possible with Big Data Analytics

 Putting it all together
 Lessons Learned


                                                  Narus | 2
Who am I
What do I do
 Geek

 Director of Product Management, Narus Inc
   –   Narus Inc, A wholly owned subsidiary of Boeing
   –   Build High Performance Network Intelligence Systems
   –   I herd cats and make Powerpoints all day
   –   Occasionally think about product requirements
 Principal Member Technical Staff, Sprint
   – Sprint Advanced Technology Labs
   – Wireline/Wireless Network Architecture, Design, Security
   – I broke stuff

                                                                Narus | 3
What is Network & Security Analytics
A type of voodoo, but with computers

The (black) art of finding malicious or problematic
     sessions in a mountain of network traffic
 Multiple approaches
   –   Signatures/Blacklists
   –   Behavior
   –   Algorithmic
   –   Ouiji Board, Live Chicken, Full Moon, etc
 Single Goal
   – Identify malicious or problematic traffic before it causes
     substantial harm to your network or your assets.



                                                                  Narus | 4
Network & Security Analytics
What’s working against you
 The enemy is ever-changing and infinitely intelligent
 New attack vectors are more difficult to detect than ever
   –   Polymorphic, Randomized
   –   APTs are real
   –   Zero-Days
   –   Protocol, Application, OS
 Traditional Methods in-effective
   – Payloads ever changing
   – Simply too many new and existing
 Higher speeds of links makes deeper analysis harder
   – 10G/sec maxes out at ~15M packets per second

                                                        Narus | 5
What is Network & Security Analytics
Finding the Needle in a stack of Needles
 Where to look
   – Which stack of Needles do I need to look at
 What are you looking for
   – Do you know?
   – Are you guessing?
   – Do you know what you are NOT looking for?
 How to find something that is not ‘right’
   –   What is ‘right’, what is ‘not-right’, what is ‘wrong’?
   –   What is the difference?
   –   What is ‘normal’ vs what is ‘right’ ?
   –   How much data do you need ?


                                                                Narus | 6
What is Network & Security Analytics
Solving the Network & Security Analytics Problem
  Multiple Methods, Multiple Algorithms, Multiple
                 Passes Per Analytic
 You need a lot of data to determine what is ‘not-right’
   – More data == More accurate results
 You need to run sophisticated algorithms across the data
   – Use new algorithms to find something ‘not-right’
   – Not always easy
 You need multiple passes on the data
   – One Algorithm feeds the next Algorithm
   – Focus on the workflow, how an analyst would work.



                                                         Narus | 7
Breaking out of the SQL Prison
A quick rant
 SQL has been around since the 70’s
   – So have I!
   – Great for solving ‘known’ problems
 Unable to perform the deep analytics required
   – No combination of SELECT, JOIN, UDF will get you what you
     need at times
   – Unstructured data is a nightmare and now more common
 However, use of one tool does not mean you can’t use
  another tool as well
   – SQL and Hadoop can live very happily together
   – The right tool for the right job, or more precisely:
      • The right tool for the right PART of the job
                                                            Narus | 8
Network & Security Analytic
Using Hadoop to solve the hard problems
 Amount of Data
   – 1 week -> 1 Month+ of data: 100’s of Billions of Sessions, 100’s
     of TB’s of Data, ingesting dozens of data types and millions of
     sessions per hour
 Algorithms
   – Looking for sessions that look something like this thing or maybe
     unlike this other thing. You can do that right???
 Unstructured
   – We have no idea what we are going to get in terms of
     information
 Price per Analytic Hour
   – How much does it cost to run this analytic in a set amount of time

                                                                 Narus | 9
Network & Security Analytic
A Simple Workflow Example
     Find a Polymorphic BotNet/Worm infection vector
 Find the suspected infected hosts
   – Clustering/Behavior/Signatures to find possible bots and worms
 Find the Command & Control
   – From list of suspects, who are the most popular ‘servers’
 Find ALL of the possible infections
   – From C&C servers, what hosts were communicated with
   – Cluster and group similar hosts to find even more
 Find the Infection Vector
   – From all the suspect hosts, cluster hosts by common Application
     ‘features’ and traffic patterns
    You need a LOT of data and it’s non-deterministic
                                                                 Narus | 10
Network & Security Analytic
Workflow details
                     What Makes This Work
 Hadoop Tools/Methods Used
   –   Entropy, FFT, Behavior Jobs
   –   Mahout (Clustering and Machine Learning)
   –   Custom Clustering (Hourglass Co-Clustering)
   –   Custom Correlation
 Other Tools Used
   – Streaming Classification/Statistics Engine
   – RDBMS
   – Visualization Front End



                                                     Narus | 11
Network & Security Analytic
   In real life


                      Many tools enabling each other
           I need to                I know             I don’t know               I need to             I need to view
          capture the              what I am              what I am              organize the             the findings
             traffic              looking for            looking for               findings                 logically

                       Metadata                 Datasets
                                                            Deep       Summary                  Views
Packets
                                   Streaming               Analysis                 Shallow
            Capture
                                    Analysis               Hadoop                   Analysis
                                                                                    RDBMS




                                                                                                              Narus | 12
Lessons learned
How we learned to make it all work
 Don’t use a hammer when you need a scalpel
   – It just doesn’t work, don’t force it.
   – If there is a better way of doing it, use that way
 Hadoop does a lot of things really well
   – Complicated algorithms over vast amounts of data
   – Unstructured Data
 Hadoop does some things really poorly
   – Low Latency results for visualization
   – Simple Statistics and some groupings
 Use Hadoop in conjunction with other tools
   – Use the best tool for the job.
   – Break the job into pieces and evaluate the tools for each piece

                                                                Narus | 13
Conclusion
Hadoop as a platform for Network Security Analytics
 Hadoop has allowed us to solve problems for our
  customers that were previously unsolvable in a
  reasonable amount of time
 New algorithms and analytics were made possible by
  Hadoop
 By using Hadoop in conjunction with our Streaming
  Engine and an RDBMS we were able to create a system
  that performed better then just the sum of its parts.
 We are now able to scale into larger datasets and extract
  even better insights then before
 No longer confined by any tool, we leverage the power of
  Hadoop to solve many of our problems
                                                      Narus | 14
Q&A




      Narus | 15

Performing Network & Security Analytics with Hadoop

  • 1.
    Performing Network &Security Analytics with Hadoop Travis Dawson Director of Product Management Narus, Inc Hadoop Summit 2012
  • 2.
    Agenda  Who amI, What do I do  What is Network & Security Analytics  Using Hadoop in Network & Security Analytics  What becomes possible with Big Data Analytics  Putting it all together  Lessons Learned Narus | 2
  • 3.
    Who am I Whatdo I do  Geek  Director of Product Management, Narus Inc – Narus Inc, A wholly owned subsidiary of Boeing – Build High Performance Network Intelligence Systems – I herd cats and make Powerpoints all day – Occasionally think about product requirements  Principal Member Technical Staff, Sprint – Sprint Advanced Technology Labs – Wireline/Wireless Network Architecture, Design, Security – I broke stuff Narus | 3
  • 4.
    What is Network& Security Analytics A type of voodoo, but with computers The (black) art of finding malicious or problematic sessions in a mountain of network traffic  Multiple approaches – Signatures/Blacklists – Behavior – Algorithmic – Ouiji Board, Live Chicken, Full Moon, etc  Single Goal – Identify malicious or problematic traffic before it causes substantial harm to your network or your assets. Narus | 4
  • 5.
    Network & SecurityAnalytics What’s working against you The enemy is ever-changing and infinitely intelligent  New attack vectors are more difficult to detect than ever – Polymorphic, Randomized – APTs are real – Zero-Days – Protocol, Application, OS  Traditional Methods in-effective – Payloads ever changing – Simply too many new and existing  Higher speeds of links makes deeper analysis harder – 10G/sec maxes out at ~15M packets per second Narus | 5
  • 6.
    What is Network& Security Analytics Finding the Needle in a stack of Needles  Where to look – Which stack of Needles do I need to look at  What are you looking for – Do you know? – Are you guessing? – Do you know what you are NOT looking for?  How to find something that is not ‘right’ – What is ‘right’, what is ‘not-right’, what is ‘wrong’? – What is the difference? – What is ‘normal’ vs what is ‘right’ ? – How much data do you need ? Narus | 6
  • 7.
    What is Network& Security Analytics Solving the Network & Security Analytics Problem Multiple Methods, Multiple Algorithms, Multiple Passes Per Analytic  You need a lot of data to determine what is ‘not-right’ – More data == More accurate results  You need to run sophisticated algorithms across the data – Use new algorithms to find something ‘not-right’ – Not always easy  You need multiple passes on the data – One Algorithm feeds the next Algorithm – Focus on the workflow, how an analyst would work. Narus | 7
  • 8.
    Breaking out ofthe SQL Prison A quick rant  SQL has been around since the 70’s – So have I! – Great for solving ‘known’ problems  Unable to perform the deep analytics required – No combination of SELECT, JOIN, UDF will get you what you need at times – Unstructured data is a nightmare and now more common  However, use of one tool does not mean you can’t use another tool as well – SQL and Hadoop can live very happily together – The right tool for the right job, or more precisely: • The right tool for the right PART of the job Narus | 8
  • 9.
    Network & SecurityAnalytic Using Hadoop to solve the hard problems  Amount of Data – 1 week -> 1 Month+ of data: 100’s of Billions of Sessions, 100’s of TB’s of Data, ingesting dozens of data types and millions of sessions per hour  Algorithms – Looking for sessions that look something like this thing or maybe unlike this other thing. You can do that right???  Unstructured – We have no idea what we are going to get in terms of information  Price per Analytic Hour – How much does it cost to run this analytic in a set amount of time Narus | 9
  • 10.
    Network & SecurityAnalytic A Simple Workflow Example Find a Polymorphic BotNet/Worm infection vector  Find the suspected infected hosts – Clustering/Behavior/Signatures to find possible bots and worms  Find the Command & Control – From list of suspects, who are the most popular ‘servers’  Find ALL of the possible infections – From C&C servers, what hosts were communicated with – Cluster and group similar hosts to find even more  Find the Infection Vector – From all the suspect hosts, cluster hosts by common Application ‘features’ and traffic patterns You need a LOT of data and it’s non-deterministic Narus | 10
  • 11.
    Network & SecurityAnalytic Workflow details What Makes This Work  Hadoop Tools/Methods Used – Entropy, FFT, Behavior Jobs – Mahout (Clustering and Machine Learning) – Custom Clustering (Hourglass Co-Clustering) – Custom Correlation  Other Tools Used – Streaming Classification/Statistics Engine – RDBMS – Visualization Front End Narus | 11
  • 12.
    Network & SecurityAnalytic In real life Many tools enabling each other I need to I know I don’t know I need to I need to view capture the what I am what I am organize the the findings traffic looking for looking for findings logically Metadata Datasets Deep Summary Views Packets Streaming Analysis Shallow Capture Analysis Hadoop Analysis RDBMS Narus | 12
  • 13.
    Lessons learned How welearned to make it all work  Don’t use a hammer when you need a scalpel – It just doesn’t work, don’t force it. – If there is a better way of doing it, use that way  Hadoop does a lot of things really well – Complicated algorithms over vast amounts of data – Unstructured Data  Hadoop does some things really poorly – Low Latency results for visualization – Simple Statistics and some groupings  Use Hadoop in conjunction with other tools – Use the best tool for the job. – Break the job into pieces and evaluate the tools for each piece Narus | 13
  • 14.
    Conclusion Hadoop as aplatform for Network Security Analytics  Hadoop has allowed us to solve problems for our customers that were previously unsolvable in a reasonable amount of time  New algorithms and analytics were made possible by Hadoop  By using Hadoop in conjunction with our Streaming Engine and an RDBMS we were able to create a system that performed better then just the sum of its parts.  We are now able to scale into larger datasets and extract even better insights then before  No longer confined by any tool, we leverage the power of Hadoop to solve many of our problems Narus | 14
  • 15.
    Q&A Narus | 15