SlideShare a Scribd company logo
© 2015 MapR Technologies ‹#›© 2016 MapR Technologies
Tugdual Grall
Technical Evangelist
@tgrall
Anomaly Detection in Telecom with Spark
Code Motion Amsterdam

12 - May - 2016
© 2016 MapR Technologies 2
{“about” : “me”}
Tugdual “Tug” Grall
• MapR
• Technical Evangelist
• MongoDB
• Technical Evangelist
• Couchbase
• Technical Evangelist
• eXo
• CTO
• Oracle
• Developer/Product Manager
• Mainly Java/SOA
• Developer in consulting firms
• Web
• @tgrall
• http://tgrall.github.io
• tgrall

• NantesJUG co-founder

• Pet Project :
• http://www.resultri.com
• tug@mapr.com
• tugdual@gmail.com
© 2016 MapR Technologies 3
Agenda
• Introduction
• Anomaly Detection : Why?
• Anomaly Detection : How?
• Use Cases and Demonstration: Telco Sample Application
© 2016 MapR Technologies 4
Anomaly Detection
© 2016 MapR Technologies 5
Who Needs Anomaly Detection?
Utility providers using
smart meters
© 2016 MapR Technologies 6
Who Needs Anomaly Detection?
Feedback from
manufacturing assembly
lines
© 2016 MapR Technologies 7
Who Needs Anomaly Detection?
Monitoring data traffic on
communication networks
© 2016 MapR Technologies 8
What is Anomaly Detection?
• The goal is to discover rare events
– especially those that shouldn’t have happened
• Find a problem before other people see it
– especially before it causes a problem for customers
• Why is this a challenge?
– I don’t know what an anomaly looks like (yet)
© 2016 MapR Technologies 9
© 2016 MapR Technologies 10
Looks pretty
anomalous
to me
© 2016 MapR Technologies 11
Basic idea:

Find “normal” first
© 2016 MapR Technologies 12
Steps in Anomaly Detection
• Build a model: Collect and process data for training a model
• Use the machine learning model to determine what is the normal
pattern
• Decide how far away from this normal pattern you’ll consider to
be anomalous
• Use the AD model to detect anomalies in new data
– Methods such as clustering for discovery can be helpful
© 2016 MapR Technologies 13
How hard is it to set an alert for anomalies?
Grey data is from normal events; x’s are anomalies.
Where would you set the threshold?
© 2016 MapR Technologies 14
Basic idea:

Set adaptive thresholds
© 2016 MapR Technologies 15
99.9%-ile
© 2016 MapR Technologies 16
With Spikes
99.9%-ile including spikes
© 2016 MapR Technologies 17
Online
Summarizer
99.9%-ile
t
x > t ? Alarm !
x
How Hard Can it Be?
© 2016 MapR Technologies 18
Key Steps in Anomaly Detection
• What is normal?
• What will you measure to identify things that are “far” from normal?
• How far is “far”, if something is to be considered anomalous?
© 2016 MapR Technologies 19
A lot more….
• Model normal, then find
anomalies
• t-digest for adaptive
threshold
• Probabilistic models for
complex patterns
-
0 5 10 15
−20246810
offset+noise+pulse1+pulse2
A
B
© 2016 MapR Technologies 20
https://www.mapr.com/ebook
Learn more about 

Machine Learning & Anomaly Detection
© 2016 MapR Technologies 21
Yes…
but how do I build such application?
© 2016 MapR Technologies 22
© 2016 MapR Technologies 23
© 2016 MapR Technologies 24
© 2016 MapR Technologies 25
© 2016 MapR Technologies 26
Data flow and processing
1. Device to Antenna
2. Antenna to main data center
3. Application should:
✓Store the data
✓Analyse/process the data
✓Detect Anomalies and alert IT
© 2016 MapR Technologies 27
Data flow and processing
1. Device to Antenna
2. Antenna to main data center
3. Application should:
✓Store the data
✓Analyse/process the data
✓Detect Anomalies and alert IT
➡ Pure mobile GSM, LTE, 5G, …
➡ Streaming Technology
➡ Big Data Storage
➡ Distributed Processing
➡ Machine Learning
© 2016 MapR Technologies 28
Architecture
Streams
HDFS/MapR-FS
HBase/MapR-DB JSON
Streaming
Streaming
SQL Engine
Analytics
JDBC/ODBC
© 2016 MapR Technologies 29
© 2016 MapR Technologies 30
• Cluster Computing Platform
• Extends “MapReduce” with
extensions
– Streaming
– Interactive Analytics
• Run in Memory
© 2015 MapR Technologies ‹#›@tgrall
Spark components
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
Mesos
Distributed File System (HDFS, MapR-FS, S3, …)
Hadoop YARN
© 2016 MapR Technologies 32
Spark Resilient Distributed Datasets “RDD”
Sensor RDD
W
Executor
P4
W
Executor
P1 P3
W
Executor
P2
sc.textFile P1
8213034705,
95, 2.927373,
jake7870, 0……
P2
8213034705,
115, 2.943484,
Davidbresler2,
1….
P3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
P4
8213034705,
117, 2.998947,
daysrus, 95….
© 2016 MapR Technologies 33
Spark Resilient Distributed Datasets
Transformation
Filter()
Action
Count()
RDD
newRDD
Value
© 2015 MapR Technologies@tgrall
Transformations
• Process an RDD, returns an RDD
• Examples :
• map() : one value => another value
• mapToPair() : one value => a tuple
• filter() : filters values/tuples on a given condition
• groupByKey() : groups values by key
• reduceByKey() : aggregates values by key
• join(), cogroup(), … : joins RDDs
© 2015 MapR Technologies@tgrall
Actions
• Process an RDD, returns a value
• Examples :
• count() : counts number of items in dataset
• first() : returns first entry
• take(n) : returns array of the n first elements
• foreach() : applies a function on each element
• collect() : returns all elements
• saveAsTextFile() : saves in files each element
© 2015 MapR Technologies@tgrall
© 2015 MapR Technologies@tgrall
Apache Kafka
• Feeds of messages are organised in
topics
• Processes that publish messages are
called producers
• Processes that subscribed to topic
and process messages are
consumers
• A Kafka cluster is made of one or
more brokers (== node)
© 2016 MapR Technologies 38
Broker 1
Topic A Topic B
Broker 2
Topic A Topic B
Broker 3
Topic A Topic B
Producer
Producer
Producer
Consumer
Consumer
Consumer
© 2016 MapR Technologies 39
What is Spark Streaming?
• Enables scalable, high-throughput, fault-tolerant stream
processing of live data
• Extension of the core Spark
Data Sources Data Sinks
© 2016 MapR Technologies 40
Spark Streaming Architecture
• Divide data stream into batches of X seconds (micro batching)
• Called DStream = sequence of RDDs
Spark
Streaming
input data
stream
DStream RDD batches
Batch
interval
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
© 2016 MapR Technologies 41
Demonstration
https://github.com/mapr-demos/telco-anomaly-detection-spark
© 2016 MapR Technologies 42
Sample Code
• Universe dealing with Antenna & Users (Akka / Actors)
• Antenna Send Data to Spark (Kafka/Streams & Spark Streaming)
• Aggregate CDR Data by Tower (Spark & MapR DB)
• Analyse Tower Behaviour and Send Alerts when needed (Spark &
Kafka/Streams)
© 2016 MapR Technologies 43
Conclusion
• Build a streaming based application to capture data in real time
• Apache Kafka / MapR Streams
• Store data into a scalable data store
• MapR-FS/DB, Hadoop, NoSQL with Spark Support
• Use Spark & Spark Streaming to process data in real time
• Run Analytics jobs using Spark or SQL on “Hadoop” (Apache
Drill)
© 2016 MapR Technologies 44
Interesting Skills to Add to your Resume
• Apache Kafka
• Apache Spark
• NoSQL
• Machine Learning Technics
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 45
IoT : Racing Cars
Producers Consumers
sensors data
Real Time
Analytics
https://github.com/mapr-demos/racing-time-series
© 2016 MapR Technologies 46
Free eBooks
http://mapr.com/ebook
© 2016 MapR Technologies 47
© 2016 MapR Technologies 48
Q&A
@tgrall maprtech
tug@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot (20)

PPTX
Unlocking Your Hadoop Data with Apache Spark and CDH5
SAP Concur
 
PDF
Free Code Friday - Machine Learning with Apache Spark
MapR Technologies
 
PDF
Rapids: Data Science on GPUs
inside-BigData.com
 
PPTX
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
PDF
Hadoop to spark_v2
elephantscale
 
PDF
Introduction to Spark on Hadoop
Carol McDonald
 
PDF
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark Summit
 
PDF
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
PPTX
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
PDF
Dev Ops Training
Spark Summit
 
PPTX
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Ryan Bosshart
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
PPTX
Dealing with an Upside Down Internet
MapR Technologies
 
PDF
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
 
PDF
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 
PDF
Spark Summit - Stratio Streaming
Stratio
 
PDF
Spark Summit EU talk by Zoltan Zvara
Spark Summit
 
PDF
MapR & Skytree:
MapR Technologies
 
PDF
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
PDF
On-Prem Solution for the Selection of Wind Energy Models
Databricks
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
SAP Concur
 
Free Code Friday - Machine Learning with Apache Spark
MapR Technologies
 
Rapids: Data Science on GPUs
inside-BigData.com
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
Hadoop to spark_v2
elephantscale
 
Introduction to Spark on Hadoop
Carol McDonald
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
Dev Ops Training
Spark Summit
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Ryan Bosshart
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Dealing with an Upside Down Internet
MapR Technologies
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 
Spark Summit - Stratio Streaming
Stratio
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit
 
MapR & Skytree:
MapR Technologies
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
On-Prem Solution for the Selection of Wind Energy Models
Databricks
 

Viewers also liked (20)

PPTX
Anomaly Detection with Apache Spark
Cloudera, Inc.
 
PPTX
Anomaly Detection using Spark MLlib and Spark Streaming
Keira Zhou
 
PDF
A Practical Guide to Anomaly Detection for DevOps
BigPanda
 
PDF
Internship_presentation
Aditya Gautam
 
PPTX
HawkEye : A Real-time Anomaly Detection System
Satnam Singh
 
PDF
Analytics for large-scale time series and event data
Anodot
 
PPTX
Science of Anomaly Detection
Numenta
 
PDF
Big Telco - Yousun Jeong
Spark Summit
 
PPTX
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
Dataconomy Media
 
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
PDF
Anomaly Detection at Scale
Jeff Henrikson
 
PPTX
Data Mining with Splunk
David Carasso
 
PDF
Predictive Analytics with Numenta Machine Intelligence
Numenta
 
PDF
Detecting Anomalies in Streaming Data
Numenta
 
PDF
OrientDB - the 2nd generation of (MultiModel) NoSQL - Luigi Dell Aquila - Cod...
Codemotion
 
PPTX
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...
Codemotion
 
PDF
Distributed Companies: A WordPress.com Team Perspective - Davide Casali - Cod...
Codemotion
 
PDF
Microsoft <3 Open Source: Un anno dopo!
Codemotion
 
PDF
Maker Experience: user centered toolkit for makers
Codemotion
 
Anomaly Detection with Apache Spark
Cloudera, Inc.
 
Anomaly Detection using Spark MLlib and Spark Streaming
Keira Zhou
 
A Practical Guide to Anomaly Detection for DevOps
BigPanda
 
Internship_presentation
Aditya Gautam
 
HawkEye : A Real-time Anomaly Detection System
Satnam Singh
 
Analytics for large-scale time series and event data
Anodot
 
Science of Anomaly Detection
Numenta
 
Big Telco - Yousun Jeong
Spark Summit
 
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
Dataconomy Media
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Anomaly Detection at Scale
Jeff Henrikson
 
Data Mining with Splunk
David Carasso
 
Predictive Analytics with Numenta Machine Intelligence
Numenta
 
Detecting Anomalies in Streaming Data
Numenta
 
OrientDB - the 2nd generation of (MultiModel) NoSQL - Luigi Dell Aquila - Cod...
Codemotion
 
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...
Codemotion
 
Distributed Companies: A WordPress.com Team Perspective - Davide Casali - Cod...
Codemotion
 
Microsoft <3 Open Source: Un anno dopo!
Codemotion
 
Maker Experience: user centered toolkit for makers
Codemotion
 
Ad

Similar to Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterdam 2016 (20)

PDF
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
PDF
Fast Cars, Big Data - How Streaming Can Help Formula 1
Tugdual Grall
 
PDF
Streaming in the Extreme
Julius Remigio, CBIP
 
PDF
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
PPTX
Where is Data Going? - RMDC Keynote
Ted Dunning
 
PPTX
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
 
PDF
Distributed Deep Learning on Spark
Mathieu Dumoulin
 
PDF
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
DataWorks Summit/Hadoop Summit
 
PPTX
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
 
PPTX
Map r seattle streams meetup oct 2016
Nitin Kumar
 
PDF
Spark Streaming Data Pipelines
MapR Technologies
 
PDF
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
 
PDF
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 
PPTX
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
PPTX
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
PDF
Is Spark Replacing Hadoop
MapR Technologies
 
PDF
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Mathieu Dumoulin
 
PPTX
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
PPTX
How Spark is Enabling the New Wave of Converged Applications
MapR Technologies
 
PDF
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Tugdual Grall
 
Streaming in the Extreme
Julius Remigio, CBIP
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
Where is Data Going? - RMDC Keynote
Ted Dunning
 
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
 
Distributed Deep Learning on Spark
Mathieu Dumoulin
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
DataWorks Summit/Hadoop Summit
 
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
 
Map r seattle streams meetup oct 2016
Nitin Kumar
 
Spark Streaming Data Pipelines
MapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Is Spark Replacing Hadoop
MapR Technologies
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Mathieu Dumoulin
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
How Spark is Enabling the New Wave of Converged Applications
MapR Technologies
 
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Ad

More from Codemotion (20)

PDF
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
PDF
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
PPTX
Pastore - Commodore 65 - La storia
Codemotion
 
PPTX
Pennisi - Essere Richard Altwasser
Codemotion
 
PPTX
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
PPTX
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
PDF
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
PDF
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
PDF
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
PDF
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
PDF
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
PDF
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
PPTX
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
PDF
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
PDF
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
PDF
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
PDF
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
Pastore - Commodore 65 - La storia
Codemotion
 
Pennisi - Essere Richard Altwasser
Codemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 

Recently uploaded (20)

PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Digital Circuits, important subject in CS
contactparinay1
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 

Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterdam 2016

  • 1. © 2015 MapR Technologies ‹#›© 2016 MapR Technologies Tugdual Grall Technical Evangelist @tgrall Anomaly Detection in Telecom with Spark Code Motion Amsterdam
 12 - May - 2016
  • 2. © 2016 MapR Technologies 2 {“about” : “me”} Tugdual “Tug” Grall • MapR • Technical Evangelist • MongoDB • Technical Evangelist • Couchbase • Technical Evangelist • eXo • CTO • Oracle • Developer/Product Manager • Mainly Java/SOA • Developer in consulting firms • Web • @tgrall • http://tgrall.github.io • tgrall
 • NantesJUG co-founder
 • Pet Project : • http://www.resultri.com • [email protected][email protected]
  • 3. © 2016 MapR Technologies 3 Agenda • Introduction • Anomaly Detection : Why? • Anomaly Detection : How? • Use Cases and Demonstration: Telco Sample Application
  • 4. © 2016 MapR Technologies 4 Anomaly Detection
  • 5. © 2016 MapR Technologies 5 Who Needs Anomaly Detection? Utility providers using smart meters
  • 6. © 2016 MapR Technologies 6 Who Needs Anomaly Detection? Feedback from manufacturing assembly lines
  • 7. © 2016 MapR Technologies 7 Who Needs Anomaly Detection? Monitoring data traffic on communication networks
  • 8. © 2016 MapR Technologies 8 What is Anomaly Detection? • The goal is to discover rare events – especially those that shouldn’t have happened • Find a problem before other people see it – especially before it causes a problem for customers • Why is this a challenge? – I don’t know what an anomaly looks like (yet)
  • 9. © 2016 MapR Technologies 9
  • 10. © 2016 MapR Technologies 10 Looks pretty anomalous to me
  • 11. © 2016 MapR Technologies 11 Basic idea:
 Find “normal” first
  • 12. © 2016 MapR Technologies 12 Steps in Anomaly Detection • Build a model: Collect and process data for training a model • Use the machine learning model to determine what is the normal pattern • Decide how far away from this normal pattern you’ll consider to be anomalous • Use the AD model to detect anomalies in new data – Methods such as clustering for discovery can be helpful
  • 13. © 2016 MapR Technologies 13 How hard is it to set an alert for anomalies? Grey data is from normal events; x’s are anomalies. Where would you set the threshold?
  • 14. © 2016 MapR Technologies 14 Basic idea:
 Set adaptive thresholds
  • 15. © 2016 MapR Technologies 15 99.9%-ile
  • 16. © 2016 MapR Technologies 16 With Spikes 99.9%-ile including spikes
  • 17. © 2016 MapR Technologies 17 Online Summarizer 99.9%-ile t x > t ? Alarm ! x How Hard Can it Be?
  • 18. © 2016 MapR Technologies 18 Key Steps in Anomaly Detection • What is normal? • What will you measure to identify things that are “far” from normal? • How far is “far”, if something is to be considered anomalous?
  • 19. © 2016 MapR Technologies 19 A lot more…. • Model normal, then find anomalies • t-digest for adaptive threshold • Probabilistic models for complex patterns - 0 5 10 15 −20246810 offset+noise+pulse1+pulse2 A B
  • 20. © 2016 MapR Technologies 20 https://www.mapr.com/ebook Learn more about 
 Machine Learning & Anomaly Detection
  • 21. © 2016 MapR Technologies 21 Yes… but how do I build such application?
  • 22. © 2016 MapR Technologies 22
  • 23. © 2016 MapR Technologies 23
  • 24. © 2016 MapR Technologies 24
  • 25. © 2016 MapR Technologies 25
  • 26. © 2016 MapR Technologies 26 Data flow and processing 1. Device to Antenna 2. Antenna to main data center 3. Application should: ✓Store the data ✓Analyse/process the data ✓Detect Anomalies and alert IT
  • 27. © 2016 MapR Technologies 27 Data flow and processing 1. Device to Antenna 2. Antenna to main data center 3. Application should: ✓Store the data ✓Analyse/process the data ✓Detect Anomalies and alert IT ➡ Pure mobile GSM, LTE, 5G, … ➡ Streaming Technology ➡ Big Data Storage ➡ Distributed Processing ➡ Machine Learning
  • 28. © 2016 MapR Technologies 28 Architecture Streams HDFS/MapR-FS HBase/MapR-DB JSON Streaming Streaming SQL Engine Analytics JDBC/ODBC
  • 29. © 2016 MapR Technologies 29
  • 30. © 2016 MapR Technologies 30 • Cluster Computing Platform • Extends “MapReduce” with extensions – Streaming – Interactive Analytics • Run in Memory
  • 31. © 2015 MapR Technologies ‹#›@tgrall Spark components Spark SQL Spark Streaming (Streaming) MLlib (Machine Learning) Spark Core (General execution engine) GraphX (Graph Computation) Mesos Distributed File System (HDFS, MapR-FS, S3, …) Hadoop YARN
  • 32. © 2016 MapR Technologies 32 Spark Resilient Distributed Datasets “RDD” Sensor RDD W Executor P4 W Executor P1 P3 W Executor P2 sc.textFile P1 8213034705, 95, 2.927373, jake7870, 0…… P2 8213034705, 115, 2.943484, Davidbresler2, 1…. P3 8213034705, 100, 2.951285, gladimacowgirl, 58… P4 8213034705, 117, 2.998947, daysrus, 95….
  • 33. © 2016 MapR Technologies 33 Spark Resilient Distributed Datasets Transformation Filter() Action Count() RDD newRDD Value
  • 34. © 2015 MapR Technologies@tgrall Transformations • Process an RDD, returns an RDD • Examples : • map() : one value => another value • mapToPair() : one value => a tuple • filter() : filters values/tuples on a given condition • groupByKey() : groups values by key • reduceByKey() : aggregates values by key • join(), cogroup(), … : joins RDDs
  • 35. © 2015 MapR Technologies@tgrall Actions • Process an RDD, returns a value • Examples : • count() : counts number of items in dataset • first() : returns first entry • take(n) : returns array of the n first elements • foreach() : applies a function on each element • collect() : returns all elements • saveAsTextFile() : saves in files each element
  • 36. © 2015 MapR Technologies@tgrall
  • 37. © 2015 MapR Technologies@tgrall Apache Kafka • Feeds of messages are organised in topics • Processes that publish messages are called producers • Processes that subscribed to topic and process messages are consumers • A Kafka cluster is made of one or more brokers (== node)
  • 38. © 2016 MapR Technologies 38 Broker 1 Topic A Topic B Broker 2 Topic A Topic B Broker 3 Topic A Topic B Producer Producer Producer Consumer Consumer Consumer
  • 39. © 2016 MapR Technologies 39 What is Spark Streaming? • Enables scalable, high-throughput, fault-tolerant stream processing of live data • Extension of the core Spark Data Sources Data Sinks
  • 40. © 2016 MapR Technologies 40 Spark Streaming Architecture • Divide data stream into batches of X seconds (micro batching) • Called DStream = sequence of RDDs Spark Streaming input data stream DStream RDD batches Batch interval data from time 0 to 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3RDD @ time 1
  • 41. © 2016 MapR Technologies 41 Demonstration https://github.com/mapr-demos/telco-anomaly-detection-spark
  • 42. © 2016 MapR Technologies 42 Sample Code • Universe dealing with Antenna & Users (Akka / Actors) • Antenna Send Data to Spark (Kafka/Streams & Spark Streaming) • Aggregate CDR Data by Tower (Spark & MapR DB) • Analyse Tower Behaviour and Send Alerts when needed (Spark & Kafka/Streams)
  • 43. © 2016 MapR Technologies 43 Conclusion • Build a streaming based application to capture data in real time • Apache Kafka / MapR Streams • Store data into a scalable data store • MapR-FS/DB, Hadoop, NoSQL with Spark Support • Use Spark & Spark Streaming to process data in real time • Run Analytics jobs using Spark or SQL on “Hadoop” (Apache Drill)
  • 44. © 2016 MapR Technologies 44 Interesting Skills to Add to your Resume • Apache Kafka • Apache Spark • NoSQL • Machine Learning Technics
  • 45. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 45 IoT : Racing Cars Producers Consumers sensors data Real Time Analytics https://github.com/mapr-demos/racing-time-series
  • 46. © 2016 MapR Technologies 46 Free eBooks http://mapr.com/ebook
  • 47. © 2016 MapR Technologies 47
  • 48. © 2016 MapR Technologies 48 Q&A @tgrall maprtech [email protected] Engage with us! MapR maprtech mapr-technologies