SlideShare a Scribd company logo
2
Most read
4
Most read
5
Most read
Real-Time Data Streaming
with Databricks, Spark
and Power BI
Bennie Haelen
Principal Architect – Insight Digital Innovation
Use Case Description
• Large Metropolitan Fire Department
• Implemented a MDW architecture on Azure
• Based upon the Insight repeatable MDW framework architecture
Legend
RAW Ins-swdi-lens-aas
Azure
Automation
Ins-swdi-lens-lapp
PL_MT_raw2stage PL_processAAS
Dataflow
Workflow
PL_DATA_ORA_2_ADLS_FULL
DROPZONE
CSV file
1
2
4 7
8
9
Power BI
5
PL_MT_stage2mdw PL_DATA_mdw2asql
6
Ins-swdi-lens-asql
3
Ins-swdi-lens-adf
RAW/Archive
STAGE MDW
Oracle
.parquet
Workspace Folders
Storage Acct ins-swdi-lens-adls Databricks Hive Databases
Key Vaults
Ins-swdi-lens-email-lapp
Use Case Extension
• Need to add a real-time reporting channel
• Up-to-date location & status of equipment
• Location & status of firefighters, EMT personnel
• List of active incidents within the city
• Near real-time Visualization
• Automatically updating dashboard
• Map with automatic updates of locations and incidents
• Used by fire chiefs to make real-time move-up decisions
• Pre-emptively Move-up equipment & resources
Use Case Analysis
• Forwarding of events through the Azure Cloud
• ESB exposes a Web Sockets interface
• Azure function reads events from ESB through WebSockets interface
• Function forwards the events to the Azure cloud
• Function is hosted in a Web Application
Central FD Database
Ingest data from the
various event sources
Change Data Capture
Triggered with each
transactional operation
Enterprise Service Bus
CDC Ingest & forward
events to consumers
Solution
• Create Cloud ingest
• Real Time Stream processing
• Performant ACID Data Store
• Real-Time Visualization
`
Architectural Requirements
• Ingest Event Stream
• High ingestion rate (1000+ events per second)
• Need high-performance, fault tolerant service
• Stream Events, perform domain-specific conversions
• Need real-time streaming analytics
• Stored Processed Data in high-performant data store
• Keyed access to the data
• Ability to perform UPSERT operations
• Visualize the data in a real-time dashboard
• Updates triggered by data changes in the underlying data store
Solution Architecture
Ingestion Channel
Azure Event
Hubs
Event Processing
Databricks with Spark
Structured Streaming
Real-Time Data Store
Databricks Delta Lake
Visualization
Power BI Service
Dashboard
Ingest Event Stream
• High ingestion rate (1000+ events
per second)
• Need high-performance, fault
tolerant service
Azure Event Hubs
• Microsoft real-time data ingestion
engine
• Can ingest millions of events/second
• Kafka compatibility
Process Stream
• Continuous Processing
• Real time ingestion
• Micro-batch processing
Databricks on Azure
• Spark Structured Streaming
• Fault-tolerant Stream processing
engine
• Kafka compatibility
Real-Time Storage
• Keyed Access to Data
• Ability to perform UPSERTS
• Simple SQL-based access
Delta Lake
• ACID Transactions
• High Scalability
Real-Time Visualization
• Simple Integration
• Updates through Data Triggers
• Direct Query into Data Source
Microsoft Power BI
• Direct Query against Delta Lake
• Real-time dashboarding facilities
• Updates trigger through data
changes or push datasets
Demo Architecture
• nb-create-unitStatusTable notebook
Invokes the generic CreateDeltaTable with the
appropriate parameters to create our UnitStatus
table
• nb-create-delta-table notebook
Generic notebook which creates a Delta table
• nb-eventhub-spark-streaming notebook
reads the events from Event Hubs and invokes the
foreachBatch sink function implemented in nb-
unitstatus-event-processor notebook
• nb-unitstatus-event-processor
Processes the events, performs the transformations, and
finally updates our UnitStatusTable
Units-eh
Event Hub
C# .NET Console Application
nb-eventhub-spark-
streaming
Databricks Notebook
nb-unitstatus-
event-processor
Delta Table
old_stream_fd.
unit_status
Databricks Notebook
nb-create-unit-
status-table
Databricks Notebook
nb-create-delta-
table
Create Delta Table
unit_status
UPSERTS
Power BI Premium
Power BI Report
Streaming-
demo.eventsimulator
Databricks Notebook
Demo - Organization
Creation of
Delta Lake Table
Implementation Resources Walk Through
Spark Streaming
Notebook
Stream Processor
Function
Demo Run
Event Simulator
Demo 1 – Infrastructure Walkthrough
Demo 2 – Code Walkthrough
Demo 3 – Sample Run
Summary
• The need for large scale real-time stream processing
become more evident every day
• Provide organizations with the ability to respond quickly
to a dynamic business climate
• Spark Structured Streaming makes it easy to add a real-
time channel
• Simple extensions on top of Spark SQL
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot (20)

PDF
What’s New with Databricks Machine Learning
Databricks
Ā 
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
Ā 
PDF
Introducing Databricks Delta
Databricks
Ā 
PPTX
Databricks Platform.pptx
Alex Ivy
Ā 
PPTX
Free Training: How to Build a Lakehouse
Databricks
Ā 
PDF
Building large scale transactional data lake using apache hudi
Bill Liu
Ā 
PDF
Intro to Delta Lake
Databricks
Ā 
PDF
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
Ā 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
Ā 
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
Ā 
PDF
adb.pdf
AdityaMehta724216
Ā 
PDF
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
Ā 
PDF
Building End-to-End Delta Pipelines on GCP
Databricks
Ā 
PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
Ā 
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
Ā 
PPTX
Modernize & Automate Analytics Data Pipelines
Carole Gunst
Ā 
PDF
Introduction to Azure Data Factory
Slava Kokaev
Ā 
PPTX
Azure Data Factory
HARIHARAN R
Ā 
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai WƤhner
Ā 
PDF
Azure Data Factory V2; The Data Flows
Thomas Sykes
Ā 
What’s New with Databricks Machine Learning
Databricks
Ā 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
Ā 
Introducing Databricks Delta
Databricks
Ā 
Databricks Platform.pptx
Alex Ivy
Ā 
Free Training: How to Build a Lakehouse
Databricks
Ā 
Building large scale transactional data lake using apache hudi
Bill Liu
Ā 
Intro to Delta Lake
Databricks
Ā 
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
Ā 
Building an open data platform with apache iceberg
Alluxio, Inc.
Ā 
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
Ā 
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
Ā 
Building End-to-End Delta Pipelines on GCP
Databricks
Ā 
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
Ā 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
Ā 
Modernize & Automate Analytics Data Pipelines
Carole Gunst
Ā 
Introduction to Azure Data Factory
Slava Kokaev
Ā 
Azure Data Factory
HARIHARAN R
Ā 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai WƤhner
Ā 
Azure Data Factory V2; The Data Flows
Thomas Sykes
Ā 

Similar to Build Real-Time Applications with Databricks Streaming (20)

PDF
AI-Powered Streaming Analytics for Real-Time Customer Experience
Databricks
Ā 
PDF
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
Ā 
PPTX
StructuredStreaming webinar slides.pptx
GianCarloPoggiEscoba1
Ā 
PPTX
StructuredStreaming webinar slides.pptx
MiloudMihoubi
Ā 
PDF
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
Ā 
PDF
Streaming analytics
Gerard McNamee
Ā 
PDF
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
HostedbyConfluent
Ā 
PPTX
Event Hub & Azure Stream Analytics
Davide Mauri
Ā 
PDF
Streaming Visualization
Guido Schmutz
Ā 
PDF
Streaming analytics state of the art
Stavros Kontopoulos
Ā 
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
Ā 
PDF
Building a Streaming Data Pipeline for Trains Delays Processing
Databricks
Ā 
PPTX
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
Ā 
PDF
Introduction to Stream Processing
Guido Schmutz
Ā 
PPTX
StreamCentral Technical Overview
Raheel Retiwalla
Ā 
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
Ā 
PDF
Cosmos DB Real-time Advanced Analytics Workshop
Databricks
Ā 
PDF
The State of Streaming.pdf
AvinashUpadhyaya3
Ā 
PDF
Introduction to Stream Processing
Guido Schmutz
Ā 
PDF
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
Ā 
AI-Powered Streaming Analytics for Real-Time Customer Experience
Databricks
Ā 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
Ā 
StructuredStreaming webinar slides.pptx
GianCarloPoggiEscoba1
Ā 
StructuredStreaming webinar slides.pptx
MiloudMihoubi
Ā 
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
Ā 
Streaming analytics
Gerard McNamee
Ā 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
HostedbyConfluent
Ā 
Event Hub & Azure Stream Analytics
Davide Mauri
Ā 
Streaming Visualization
Guido Schmutz
Ā 
Streaming analytics state of the art
Stavros Kontopoulos
Ā 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
Ā 
Building a Streaming Data Pipeline for Trains Delays Processing
Databricks
Ā 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
Ā 
Introduction to Stream Processing
Guido Schmutz
Ā 
StreamCentral Technical Overview
Raheel Retiwalla
Ā 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
Ā 
Cosmos DB Real-time Advanced Analytics Workshop
Databricks
Ā 
The State of Streaming.pdf
AvinashUpadhyaya3
Ā 
Introduction to Stream Processing
Guido Schmutz
Ā 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
Ā 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
Ā 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
Ā 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
Ā 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
Ā 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
Ā 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
Ā 
PDF
Learn to Use Databricks for Data Science
Databricks
Ā 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
Ā 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
Ā 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
Ā 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
Ā 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
Ā 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
Ā 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
Ā 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
Ā 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
Ā 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
Ā 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
Ā 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
Ā 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
Ā 
DW Migration Webinar-March 2022.pptx
Databricks
Ā 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
Ā 
Data Lakehouse Symposium | Day 2
Databricks
Ā 
Data Lakehouse Symposium | Day 4
Databricks
Ā 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
Ā 
Democratizing Data Quality Through a Centralized Platform
Databricks
Ā 
Learn to Use Databricks for Data Science
Databricks
Ā 
Why APM Is Not the Same As ML Monitoring
Databricks
Ā 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
Ā 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
Ā 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
Ā 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
Ā 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
Ā 
Sawtooth Windows for Feature Aggregations
Databricks
Ā 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
Ā 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
Ā 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
Ā 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
Ā 
Massive Data Processing in Adobe Using Delta Lake
Databricks
Ā 
Machine Learning CI/CD for Email Attack Detection
Databricks
Ā 
Ad

Recently uploaded (20)

PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
Ā 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
Ā 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
Ā 
PPTX
What Is Data Integration and Transformation?
subhashenia
Ā 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
Ā 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
Ā 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
Ā 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
Ā 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
Ā 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
šŸ“Š Markus Baersch
Ā 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
Ā 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
Ā 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
Ā 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
Ā 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
Ā 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
Ā 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
Ā 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
Ā 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
Ā 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
Ā 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
Ā 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
Ā 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
Ā 
What Is Data Integration and Transformation?
subhashenia
Ā 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
Ā 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
Ā 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
Ā 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
Ā 
How to Add Columns and Rows in an R Data Frame
subhashenia
Ā 
JavaScript - Good or Bad? Tips for Google Tag Manager
šŸ“Š Markus Baersch
Ā 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
Ā 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
Ā 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
Ā 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
Ā 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
Ā 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
Ā 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
Ā 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
Ā 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
Ā 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
Ā 

Build Real-Time Applications with Databricks Streaming

  • 1. Real-Time Data Streaming with Databricks, Spark and Power BI Bennie Haelen Principal Architect – Insight Digital Innovation
  • 2. Use Case Description • Large Metropolitan Fire Department • Implemented a MDW architecture on Azure • Based upon the Insight repeatable MDW framework architecture Legend RAW Ins-swdi-lens-aas Azure Automation Ins-swdi-lens-lapp PL_MT_raw2stage PL_processAAS Dataflow Workflow PL_DATA_ORA_2_ADLS_FULL DROPZONE CSV file 1 2 4 7 8 9 Power BI 5 PL_MT_stage2mdw PL_DATA_mdw2asql 6 Ins-swdi-lens-asql 3 Ins-swdi-lens-adf RAW/Archive STAGE MDW Oracle .parquet Workspace Folders Storage Acct ins-swdi-lens-adls Databricks Hive Databases Key Vaults Ins-swdi-lens-email-lapp
  • 3. Use Case Extension • Need to add a real-time reporting channel • Up-to-date location & status of equipment • Location & status of firefighters, EMT personnel • List of active incidents within the city • Near real-time Visualization • Automatically updating dashboard • Map with automatic updates of locations and incidents • Used by fire chiefs to make real-time move-up decisions • Pre-emptively Move-up equipment & resources
  • 4. Use Case Analysis • Forwarding of events through the Azure Cloud • ESB exposes a Web Sockets interface • Azure function reads events from ESB through WebSockets interface • Function forwards the events to the Azure cloud • Function is hosted in a Web Application Central FD Database Ingest data from the various event sources Change Data Capture Triggered with each transactional operation Enterprise Service Bus CDC Ingest & forward events to consumers Solution • Create Cloud ingest • Real Time Stream processing • Performant ACID Data Store • Real-Time Visualization `
  • 5. Architectural Requirements • Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service • Stream Events, perform domain-specific conversions • Need real-time streaming analytics • Stored Processed Data in high-performant data store • Keyed access to the data • Ability to perform UPSERT operations • Visualize the data in a real-time dashboard • Updates triggered by data changes in the underlying data store
  • 6. Solution Architecture Ingestion Channel Azure Event Hubs Event Processing Databricks with Spark Structured Streaming Real-Time Data Store Databricks Delta Lake Visualization Power BI Service Dashboard Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service Azure Event Hubs • Microsoft real-time data ingestion engine • Can ingest millions of events/second • Kafka compatibility Process Stream • Continuous Processing • Real time ingestion • Micro-batch processing Databricks on Azure • Spark Structured Streaming • Fault-tolerant Stream processing engine • Kafka compatibility Real-Time Storage • Keyed Access to Data • Ability to perform UPSERTS • Simple SQL-based access Delta Lake • ACID Transactions • High Scalability Real-Time Visualization • Simple Integration • Updates through Data Triggers • Direct Query into Data Source Microsoft Power BI • Direct Query against Delta Lake • Real-time dashboarding facilities • Updates trigger through data changes or push datasets
  • 7. Demo Architecture • nb-create-unitStatusTable notebook Invokes the generic CreateDeltaTable with the appropriate parameters to create our UnitStatus table • nb-create-delta-table notebook Generic notebook which creates a Delta table • nb-eventhub-spark-streaming notebook reads the events from Event Hubs and invokes the foreachBatch sink function implemented in nb- unitstatus-event-processor notebook • nb-unitstatus-event-processor Processes the events, performs the transformations, and finally updates our UnitStatusTable Units-eh Event Hub C# .NET Console Application nb-eventhub-spark- streaming Databricks Notebook nb-unitstatus- event-processor Delta Table old_stream_fd. unit_status Databricks Notebook nb-create-unit- status-table Databricks Notebook nb-create-delta- table Create Delta Table unit_status UPSERTS Power BI Premium Power BI Report Streaming- demo.eventsimulator Databricks Notebook
  • 8. Demo - Organization Creation of Delta Lake Table Implementation Resources Walk Through Spark Streaming Notebook Stream Processor Function Demo Run Event Simulator
  • 9. Demo 1 – Infrastructure Walkthrough
  • 10. Demo 2 – Code Walkthrough
  • 11. Demo 3 – Sample Run
  • 12. Summary • The need for large scale real-time stream processing become more evident every day • Provide organizations with the ability to respond quickly to a dynamic business climate • Spark Structured Streaming makes it easy to add a real- time channel • Simple extensions on top of Spark SQL
  • 13. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.