SlideShare a Scribd company logo
Big data streaming
Willem Meints
Microservices & analytics
Event bus
Micro services
• Multiple smaller services that scale independedely
• Each service his own data store
• Data flows between services through the event bus
Rapids
Rivers
Lakes
Data analytics challenges with microservices
• A complete picture is there, but spread over a vast landscape
• Most data doesn’t come in a database
• Data changes rapidly
Exploring some scenarios
Scenario 1: Get a annual sales report
• The goal is to get a complete picture of the situation
• Data based on business events
OrdersInvoices
Event bus
Data analytics
Data Lake
OrdersInvoices
Event bus
Data analytics
Data Lake
Scenario 2: Detect anomalies
• The goal is to detect anomalies on the website and prevent abuse
• Machine learning needed to detect the anomalies
• Data based on the data lake
Click stream
collector
Event bus
Data analytics
Data Lake
Click stream
collector
Event bus
Data analytics
Data Lake
Model
Analytics tools
vs
Event bus
Data
processing tool
Distributed
database
Alerting
Dashboarding
Event bus
Data
processing tool
Distributed
database
Alerting
Dashboarding
Flow control
logic
Cluster
Manager
The Azure based solution
Azure Event Hub HDInsight
Azure Data Lake
Alerting
Dashboarding
Azure App
Services
Cluster
Manager
Demo
A short introduction into Apache Spark
Spark SQL Spark Streaming
Machine
Learning
GraphX
Apache Spark Core
Big data streaming with Apache Spark on Azure
Resilient Distributed Data Sets
Resilient Distributed Dataset
Partition
Record Record
Partition
Record Record
Stream Batches Processed data
Streams with Spark
Stream Batches Processed data
Streams with Spark
Lists of RDDs
Demo
Deploying Spark to Azure using HDInsight
Azure Event Hubs
• Capable of streaming large
volumes of data
• SDK available in many languages
• Ruby
• Python
• Java/Scala
• C#
• Apache Spark
Hoe werkt een Azure Event Hub?
Partition
Partition
Partition
Consumer
group
Consumer
group
Demo
Using Azure Event Hub with Spark
Tips for going in production
• When using streams, always have n+1 worker nodes
• More partitions = more speed
• Longer intervals is slower, but sometimes better
Thanks!
Willem Meints
Technical Evangelist/Microsoft MVP
@willem_meints

More Related Content

What's hot (20)

PPTX
WebAction-Sami Abkay
Inside Analysis
 
PPTX
Challenges of monitoring distributed systems
Nenad Bozic
 
PDF
Why Finance Should Consider Agile Modern Data Delivery Platform
syed_javed
 
PDF
Simply Business' Data Platform
Dani Solà Lagares
 
PDF
Keynote: Elastic Observability evolution and vision
Elasticsearch
 
PDF
Taming the QIX Engine with Reactive Programming
Speros Kokenes
 
PPTX
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
PDF
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
StampedeCon
 
PDF
Snowplow at DA Hub emerging technology showcase
yalisassoon
 
PPTX
From Spreadsheets to Systems
Jed Reitler
 
PPTX
Real-Time Analytics with MemSQL and Spark
SingleStore
 
PPTX
Tax Lien Technology
VADAR Systems
 
PDF
Mobile Analytics
Paul Van Siclen
 
PDF
IoT Dynatrace
Malik BC
 
PDF
Keynote : évolution et vision d'Elastic Observability
Elasticsearch
 
PDF
Three Pillars, Zero Answers: Rethinking Observability
DevOps.com
 
PDF
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elasticsearch
 
PDF
Scale to Infinity with ECS
AWS Germany
 
PDF
Re-orienting your business around data
Dani Solà Lagares
 
PDF
Advanced analytics integration with python
Paul Van Siclen
 
WebAction-Sami Abkay
Inside Analysis
 
Challenges of monitoring distributed systems
Nenad Bozic
 
Why Finance Should Consider Agile Modern Data Delivery Platform
syed_javed
 
Simply Business' Data Platform
Dani Solà Lagares
 
Keynote: Elastic Observability evolution and vision
Elasticsearch
 
Taming the QIX Engine with Reactive Programming
Speros Kokenes
 
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
StampedeCon
 
Snowplow at DA Hub emerging technology showcase
yalisassoon
 
From Spreadsheets to Systems
Jed Reitler
 
Real-Time Analytics with MemSQL and Spark
SingleStore
 
Tax Lien Technology
VADAR Systems
 
Mobile Analytics
Paul Van Siclen
 
IoT Dynatrace
Malik BC
 
Keynote : évolution et vision d'Elastic Observability
Elasticsearch
 
Three Pillars, Zero Answers: Rethinking Observability
DevOps.com
 
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elasticsearch
 
Scale to Infinity with ECS
AWS Germany
 
Re-orienting your business around data
Dani Solà Lagares
 
Advanced analytics integration with python
Paul Van Siclen
 

Viewers also liked (20)

PPTX
Microsoft NYC 14
SwitchPitch
 
PPTX
Azure api app métricas com application insights
Nicolas Takashi
 
PPTX
Azure IOT
Maik van der Gaag
 
PDF
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Paco Nathan
 
PPTX
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Mike Martin
 
PPTX
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Sascha Dittmann
 
PDF
Fraud Detection using Hadoop
hadooparchbook
 
PPTX
Go Serverless with Azure Functions
Jim O'Neil
 
PPTX
Going serverless
TechExeter
 
PPTX
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
PPTX
Software scope
Shubham Dubey
 
PDF
Azure HDInsight
Koray Kocabas
 
PPTX
2016-08-25 TechExeter - going serverless with Azure
Steve Lee
 
PPTX
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Toradex
 
PPTX
Azure functions
vivek p s
 
PPTX
Spark on Azure HDInsight - spark meetup seattle
Judy Nash
 
PPTX
Open up to a better learning ecosystem
Katie Bradford
 
PDF
Going serverless
Jeremy Green
 
PDF
Microsoft Azure For Solutions Architects
Roy Kim
 
PPTX
Building big data solutions on azure
Eyal Ben Ivri
 
Microsoft NYC 14
SwitchPitch
 
Azure api app métricas com application insights
Nicolas Takashi
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Paco Nathan
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Mike Martin
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Sascha Dittmann
 
Fraud Detection using Hadoop
hadooparchbook
 
Go Serverless with Azure Functions
Jim O'Neil
 
Going serverless
TechExeter
 
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Software scope
Shubham Dubey
 
Azure HDInsight
Koray Kocabas
 
2016-08-25 TechExeter - going serverless with Azure
Steve Lee
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Toradex
 
Azure functions
vivek p s
 
Spark on Azure HDInsight - spark meetup seattle
Judy Nash
 
Open up to a better learning ecosystem
Katie Bradford
 
Going serverless
Jeremy Green
 
Microsoft Azure For Solutions Architects
Roy Kim
 
Building big data solutions on azure
Eyal Ben Ivri
 
Ad

Similar to Big data streaming with Apache Spark on Azure (20)

PDF
Streaming Visualization
Guido Schmutz
 
PDF
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PPTX
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
PPTX
Data & analytics challenges in a microservice architecture
Niels Naglé
 
PDF
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup
 
PDF
Introduction to Streaming Analytics
Guido Schmutz
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Introduction to Streaming Analytics
Guido Schmutz
 
PDF
[WSO2Con EU 2018] Streaming SQL in the Real World
WSO2
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
PPTX
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
PROIDEA
 
PPTX
Event Hub & Azure Stream Analytics
Davide Mauri
 
PPTX
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
PPTX
Building a Big Data Pipeline
Jesus Rodriguez
 
Streaming Visualization
Guido Schmutz
 
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
Introduction to Stream Processing
Guido Schmutz
 
Introduction to Stream Processing
Guido Schmutz
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
Data & analytics challenges in a microservice architecture
Niels Naglé
 
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup
 
Introduction to Streaming Analytics
Guido Schmutz
 
Introduction to Stream Processing
Guido Schmutz
 
Introduction to Streaming Analytics
Guido Schmutz
 
[WSO2Con EU 2018] Streaming SQL in the Real World
WSO2
 
Introduction to Stream Processing
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
PROIDEA
 
Event Hub & Azure Stream Analytics
Davide Mauri
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
Building a Big Data Pipeline
Jesus Rodriguez
 
Ad

More from Willem Meints (11)

PPTX
Bestuur je 3D printer met blazor
Willem Meints
 
PPTX
Help et phone home, building bots with Microsoft Bot Framework 4.
Willem Meints
 
PPTX
Agile software ontwikkeling met continuous delivery
Willem Meints
 
PPTX
Acceptance test driven development
Willem Meints
 
PPTX
LESS is more
Willem Meints
 
PPTX
Build better mobile apps and become a better person
Willem Meints
 
PPTX
Mono for android
Willem Meints
 
PPTX
Prototyping windows store apps
Willem Meints
 
PPTX
Using java interop in your xamarin.android apps
Willem Meints
 
PPTX
Search enabled applications with lucene.net
Willem Meints
 
PPTX
The metro design language for app developers
Willem Meints
 
Bestuur je 3D printer met blazor
Willem Meints
 
Help et phone home, building bots with Microsoft Bot Framework 4.
Willem Meints
 
Agile software ontwikkeling met continuous delivery
Willem Meints
 
Acceptance test driven development
Willem Meints
 
LESS is more
Willem Meints
 
Build better mobile apps and become a better person
Willem Meints
 
Mono for android
Willem Meints
 
Prototyping windows store apps
Willem Meints
 
Using java interop in your xamarin.android apps
Willem Meints
 
Search enabled applications with lucene.net
Willem Meints
 
The metro design language for app developers
Willem Meints
 

Recently uploaded (20)

PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
July Patch Tuesday
Ivanti
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 

Big data streaming with Apache Spark on Azure