SlideShare a Scribd company logo
INTERACTIVE VISUALIZATION OF
STREAMING DATA POWERED BY
SPARK
Streaming @Zoomdata
Visualizations react to
new data delivered
Users start,
stop, pause
the stream
Users select a rolling
window or pin a start
time to capture
cumulative metrics
Drivers for Streaming Data
Data Freshness Time to Analytic Business Context
Challenges
• Time
• Frequency
• Retention
• Synchronization
• Order
• Updates
Addressing streaming @Zoomdata
Historical Revised
Receive Data JMS Kafka
Manipulate Stream Single JVM in Memory Spark Streaming
Hold Data in Buffer MongoDB Pluggable
Interact with Data Custom Code Pluggable
Technology Cast
• The Stream - Kafka, Kinesis, JMS
• Processing Fabric - Spark Streaming
• Landing Area - MemSQL, Solr, Kudu,
Others
How it looks
With the rest of the app
Scale Out
Demo
• Twitter Producer
• Spark Streaming
• MemSQL & Solr
Sinks
Benefits
• Contextual Expressiveness with Streaming
Data
• Independent scalability (scale-up, scale-
around)
• Expressiveness powered by Spark -- using
Windowing (dataframe API with stream)
• DR COOP, other Data management concerns
Future Work
• Cross stream synchronization & fusion
• On-demand scale out and resource
management via Mesos
• Schema evolution
• More extensible landing strategies
Thanks

More Related Content

What's hot (20)

PDF
Observability for Data Pipelines With OpenLineage
Databricks
 
PDF
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
PPTX
Lambda architecture with Spark
Vincent GALOPIN
 
PPTX
Real Time Machine Learning Visualization With Spark
Chester Chen
 
PDF
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
PDF
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks
 
PDF
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Spark Summit
 
PDF
Machine Learning Data Lineage with MLflow and Delta Lake
Databricks
 
PDF
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
PPTX
Using Visualization to Succeed with Big Data
Pactera_US
 
PDF
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
PPTX
Presto: Distributed sql query engine
kiran palaka
 
PDF
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
PDF
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Databricks
 
PDF
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
PDF
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
Spark Summit
 
PPTX
Kappa Architecture on Apache Kafka and Querona: datamass.io
Piotr Czarnas
 
Observability for Data Pipelines With OpenLineage
Databricks
 
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
Lambda architecture with Spark
Vincent GALOPIN
 
Real Time Machine Learning Visualization With Spark
Chester Chen
 
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks
 
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Spark Summit
 
Machine Learning Data Lineage with MLflow and Delta Lake
Databricks
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
Using Visualization to Succeed with Big Data
Pactera_US
 
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
Presto: Distributed sql query engine
kiran palaka
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Databricks
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
Spark Summit
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Piotr Czarnas
 

Viewers also liked (20)

PPTX
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
PDF
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Spark Summit
 
PDF
Manual de programacion_con_robots_para_la_escuela
Angel De las Heras
 
PDF
Interactive Visualization of Streaming Data Powered by Spark
Spark Summit
 
PDF
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
PDF
Big Data visualization with Apache Spark and Zeppelin
prajods
 
PDF
Data viz as interface #ignitelondon7
Makoto Inoue
 
PPTX
Flume HBase
irayan
 
PDF
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Lucidworks
 
PDF
DataEngConf: Apache Spark in Financial Modeling at BlackRock
Hakka Labs
 
PDF
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Spark Summit
 
PDF
Reactive Streams, Linking Reactive Application To Spark Streaming
Spark Summit
 
PDF
Solr As A SparkSQL DataSource
Spark Summit
 
PDF
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
Spark Summit
 
PDF
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
PPTX
Next Generation of BI
Ihor Malytskyi
 
PDF
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit
 
PPTX
SORT & JOIN IN SPARK 2.0
Sigmoid
 
PDF
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 
PDF
Apps to spark memory
University of Southern Queensland
 
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Spark Summit
 
Manual de programacion_con_robots_para_la_escuela
Angel De las Heras
 
Interactive Visualization of Streaming Data Powered by Spark
Spark Summit
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Data viz as interface #ignitelondon7
Makoto Inoue
 
Flume HBase
irayan
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Lucidworks
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
Hakka Labs
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Spark Summit
 
Reactive Streams, Linking Reactive Application To Spark Streaming
Spark Summit
 
Solr As A SparkSQL DataSource
Spark Summit
 
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
Spark Summit
 
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
Next Generation of BI
Ihor Malytskyi
 
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit
 
SORT & JOIN IN SPARK 2.0
Sigmoid
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 
Apps to spark memory
University of Southern Queensland
 
Ad

Similar to Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farchtchi (20)

PDF
Spark meetup - Zoomdata Streaming
Zoomdata
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Streaming Visualisation
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Data Streaming For Big Data
Seval Çapraz
 
PDF
Building end to end streaming application on Spark
datamantra
 
PDF
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Codemotion Dubai
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
PPTX
Data streaming fundamentals
Mohammed Fazuluddin
 
PDF
Spark streaming state of the union
Databricks
 
PDF
[Spark meetup] Spark Streaming Overview
Stratio
 
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
PDF
Introduction to Spark Streaming
datamantra
 
PPTX
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
 
PDF
Interactive Data Analysis in Spark Streaming
datamantra
 
PPTX
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
Spark meetup - Zoomdata Streaming
Zoomdata
 
Streaming Visualization
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Streaming Visualisation
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Data Streaming For Big Data
Seval Çapraz
 
Building end to end streaming application on Spark
datamantra
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Codemotion Dubai
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Data streaming fundamentals
Mohammed Fazuluddin
 
Spark streaming state of the union
Databricks
 
[Spark meetup] Spark Streaming Overview
Stratio
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
Introduction to Spark Streaming
datamantra
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
 
Interactive Data Analysis in Spark Streaming
datamantra
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
short term internship project on Data visualization
JMJCollegeComputerde
 

Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farchtchi